Hacker Newsnew | past | comments | ask | show | jobs | submit | vizzier's commentslogin

For season 1 it very much satirises the early morning open university tv educational media format from the 70s through the early 2000s [1]. I'm not sure it'd land quite the same way for other countries or even for gen-z onwards.

[1] https://www.bbc.com/historyofthebbc/anniversaries/january/op...


I wondered, but my 15 year old loved it.

That’s actually pretty cool. I can think of worse things to watch in the mornings.

i think the skibidi toilet crew would love spiders on drugs

I don't think its possible to legally arrest someone outside of your jurisdiction let alone another world leader. So no, kidnapped.


You misunderstood the Donroe doctrine. All of the Americas are his jurisdiction now.


Is that why random unknown and innocent until proven guilty people were drone double-tapped using perfidious means in waters not belonging to the US?


Again this is misunderstanding how this admin thinks. They believe in only might make right and their ability to do those murders means it’s theirs in their mind.

Look at a bunch of their commentary about Greenland talking about how can it be denmarks if Denmark can’t defend it from the US.


> but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.

True, but it is shocking how often claude suggests just disabling or removing tests.


The sneaky move that I hate most is when Claude (and does seem to mostly be a Claude-ism I haven’t encountered on GPT Codex or GLM) is when dealing with an external data source (API, locally polling hardware, etc) as a “helpful” fallback on failures it returns fake data in the shape of the expected output so that the rest of the code “works”.

Latest example is when I recently vibe coded a little Python MQTT client for a UPS connected to a spare Raspberry Pi to use with Home Assistant, and with a just few turns back and forth I got this extremely cool bespoke tool and felt really fun.

So I spent a while customizing how the data displayed on my Home Assistant dashboard and noticed every single data point was unchanging. It took a while to realize because the available data points wouldn’t be expected to change a whole lot on a fully charged UPS but the voltage and current staying at the exact same value to a decimal place for three hours raised my suspicions.

After reading the code I discovered it had just used one of the sample command line outputs from the UPS tool I gave it to write the CLI parsing logic. When an exception occurred in the parser function it instead returned the sample data so the MQTT portion of the script could still “work”.

Tbf Claude did eventually get it over the finish line once I clarified that yes, using real data from the actual UPS was in fact an important requirement for me in a real time UPS monitoring dashboard…


Always check the code.

It's similar to early versions of autonomous driving. You's not want to sit in the back seat with nobody at the wheel. That would get you killed guaranteed.


And how is that not good for humanity in an evolutionary sense (as long as it doesn't kill or maim anyone else)?

Tesla owner keeps using Autopilot from backseat—even after being arrested:

https://mashable.com/article/tesla-autopilot-arrest-driving-...


Sounds to me like more evidence in favor of the idea that they're meant to play the golden retriever engineer reporting to you, the extremely intelligent manager.

A coworker opened a PR full of AI slop. One of the first things I do is check if the tests pass. Of course, the didn't. I asked them to fix the tests, since there's no point in reviewing broken code.

"Fix the tests." This was interpreted literally, and assert status == 200 got changed to assert status == 500 in several locations. Some tests required more complex edits to make them "pass."

Inquiries about the tests went unanswered. Eventually the 2000 lines of slop was closed without merging.


After a certain point the response to low effort vibe code has to be vibe reviews. Failing tests? Bad vibes, close without merging. Much more efficient than vibe coding too, since no AI is needed.

100%, trying a bit of an experiment like this(similar in that I mostly just care about playing around with different agents, techniques etc.) it has built out literally hundreds of tests. Dozens of which were almost pointless as it decided to mock apis. When the number of failed tests exceeded 40 it just started disabling tests.

To be fair, many human developers are fond of pointless tests that mock everything to the extent that no real code is actually exercised. At least the tests are fast though.

Citing the absolute worst practices from terrible developers as a way to exonerate or legitimize LLM code production issues is something we need to stop doing in my opinion. I would not excuse or expect a day one junior on my team that wrote pointless tests or worse yet removed tests to get the CI to pass.

If LLMs do this it should be seen as an issue and should not be overlooked with “people do it too…”. Professional developers do not do this. If we’re going to use Ai for creating production code we need to be honest about its deficiencies.


I agree, but if LLMs are trained on common practices, best or worst, what do you expect?

Testing, specifically, is heavily opinionated among professional developers.


> it is shocking how often claude suggests just disabling or removing tests.

Arguably, Claude is simply successfully channeling what the developers who wrote the bulk of its training data would do. We've already seen how bad behavior injected into LLMs in one domain causes bad behavior in other domains, so I don't find this particularly shocking.

The next frontier in LLMs has to be distinguishing good training data from bad training data. The companies have to do this, even if only in self defense against the new onslaught of AI-generated slop, and against deliberate LLM poisoning.

If the models become better at critically distinguishing good from bad inputs, particularly if they can learn to treat bad inputs as examples of what not to do, I would expect one benefit of this is that the increased ability of the models to write working code will then greatly increase the willingness of the models to do so, rather than to simply disable failing tests.


Still fairly non specific, as science probably should be (the question was if there visible MRI detectable changes, not what specific downstream changes were apparent)

Interesting that if we can detect some of these changes with covid what other viruses might be doing to us up there.


This sounds a bit like the opening scene of something science fiction-ey dreadful. Like, even Pluribus.


> Me: What is your knowledge cut off date?

> ChatGPT: My knowledge cutoff is *June 2024*. I can also use live browsing to fetch more recent information when needed.

It is unsurprising that it thinks next year would be 2025, given that this token generator lives in June 2024.


> it thinks

This is your mistake right here. It doesn't think. It's a text generator. It can no more think about what year it is than Swiftkey on your phone "thinks" what year it is when you type

NEXT YEAR WILL BE

and press the middle button.


I'm as bearish as anyone on the current AI hype, but this particular ship has sailed. Research is revealing these humongous neural networks of weights for next token prediction to exhibit underlying structures that seem to map in some way to a form of knowledge about the world that is, however imperfectly, extracted from all the text they're trained on.

Arguing that this is meaningfully different from what happens in our own brains is not something I would personally be comfortable with.


> Research is revealing these humongous neural networks of weights for next token prediction to exhibit underlying structures that seem to map in some way to a form of knowledge about the world that is

[[citation needed]]

I am sorry but I need exceptionally strong proof of that statement. I think it is totally untrue.


hard to know with so few data points


"insufficient data for meaningful answer", one might say.


>hard to know with so few data points

i've yelled at the interns several times but none have been able to set up a haldane soup focus group yet


and a south park episode for ones from the future :D

https://en.wikipedia.org/wiki/Goobacks


A good sabot system could turn many other meals effectively into burritos.


Might have to be more specific than Android and Windows. Tried them on my devices (S24, windows 11) and they're practically instantaneous.


This does sound like it could be solved with better installDiskSelectors[0]. Talos has done a fair bit of work in improving this and UserVolumeConfigs in the last couple of 1.x revisions.

Alternatively, network booting in some fashion is an option. [1]

[0] https://www.talos.dev/v1.11/reference/configuration/v1alpha1...

[1] https://www.talos.dev/v1.11/talos-guides/install/bare-metal-...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: