More

kovek · 2025-12-22T04:55:41 1766379341

Yes, I believe thinking can be hampered by depression...

kovek · 2025-12-21T07:32:04 1766302324

I thought the traffic went pretty well tonight in San Francisco considering we had this major issue.

kovek · 2025-11-21T23:51:06 1763769066

See the third diagram in https://www.mdpi.com/1424-8220/24/18/6049 . There are elements of noise, of input embeddings in the form of images, or in the form of text.

kovek · 2025-10-29T05:00:48 1761714048

What does that have to do with all the discussions in the conversation I shared?

kovek · 2025-10-24T18:35:20 1761330920

I tried to ask a model to tell me what is the "long multiplication algorithm". It gave it to me. I asked it to follow that algorithm to solve eg. 12987318927 * 12098102983, and it followed the algorithm, and it got the right answer. It DOES fail more when the numbers are longer (because it results with more text in the context), but that can be improved by having the model focus on the right subset of the text, right?

photonthug · 2025-10-24T20:15:47 1761336947

> It DOES fail more when the numbers are longer (because it results with more text in the context),

I tried to raise this question yesterday. https://news.ycombinator.com/item?id=45683113#45687769

Declaring victory on "reasoning" based on cherry-picking a correct result about arithmetic is, of course, very narrow and absurdly optimistic. Even if it correctly works for all NxM calculations. Moving on from arithmetic to any kind of problem that fundamentally reduces to model-checking behind the scenes.. we would be talking about exploring a state-space with potentially many thousands of state-transitions for simple stuff. If each one even has a small chance of crapping out due to hallucination, the chance of encountering errors at the macro-scale is going to be practically guaranteed.

Everyone will say, "but you want tool-use or code-gen for this anyway". Sure! But carry-digits or similar is just one version of "correct matters" and putting some non-local kinds of demands on attention, plus it's easier to check than code. So tool-use or code-gen is just pushing the same problem somewhere else to hide it.. there's still a lot of steps involved, and each one really has to be correct if the macro-layer is going to be correct and the whole thing is going to be hands-off / actually automated. Maybe that's why local-models can still barely handle nontrivial tool-calling.

kovek · 2025-10-24T20:48:29 1761338909

Well, if the model can reliably keep in context CPU cache plus CPU registers plus CPU instructions and is able to do operations based on those, then we pretty much solved computation using LLMs, right? It could use RAG to operate on RAM and SSD.

Here we can see the amount of data a high end traditional non-SOC CPU holds:

> For a recent high-end non-SoC desktop CPU: > Cache: ~40-100 MB total (L1 + L2 + shared L3) > Register files: tens to few hundreds of KB total across cores (e.g., ~200-300 KB or so) > Combined: So you're looking at ~40-100 MB + ~0.2 MB → roughly ~40-100 MB of total on-chip caches + registers.

I'm sure we can reduce these caches to fit in the context windows of today's LLMs (~500,000 tokens).

Then, with temperature 0 we get more "discrete" operations. Now, we still have the rare problem of hallucinations, but it should be small with temperature 0.

lossolo · 2025-10-24T21:10:10 1761340210

It doesn't work like mapping CPU caches/registers into an LLM context. Transformers have no mutable registers, they attend over past tokens and can't update prior state. RAG isn't RAM. Even with huge context, you still can't step CPU style instructions without an external, read/write memory/tooling.

And temperature 0 makes outputs deterministic, not magically correct.

photonthug · 2025-10-24T21:23:23 1761341003

> And temperature 0 makes outputs deterministic, not magically correct.

For reasons I don't claim to really understand, I don't think it even makes them deterministic. Floating point something something? I'm not sure temperature even has a static technical definition or implementation everywhere at this point. I've been ignoring temperature and using nucleus sampling anywhere that's exposed and it seems to work better.

Random but typical example.. pydantic-ai has a caveat that doesn't reference any particular model: "Note that even with temperature of 0.0, the results will not be fully deterministic". And of course this is just the very bottom layer of model-config and in a system of diverse agents using different frameworks and models, it's even worse.

astrange · 2025-10-25T06:03:41 1761372221

It's partly because floating point math is not associative and GPU inference doesn't guarantee all the steps will be done in the same order.

razodactyl · 2025-10-26T13:33:45 1761485625

Well mostly but they can generate more state that can push old state out of context.

If an LLM were sufficiently trained to be able to roll-forward and correctly set the current state of some registers written into the conversation..? I wouldn't trust it though, leaves too much to chance.

I too make mistakes trying to keep track of things, I end up using tools too.

kovek · 2025-10-24T21:29:44 1761341384

Well, the LLM may re-infer the whole state fully on every instruction. Temperature 0 is deterministic and that's what we are looking for. If the model is trained properly on how the CPU state + instructions should be handled, then it should be able to produce the next state.

lossolo · 2025-10-24T22:42:10 1761345730

With temp = 0 if the model is off by one bit at step k, all subsequent steps are deterministically wrong.

Your previous example shows the best case, which is a model can sometimes follow a textual recipe for long multiplication on short inputs. That's not the same as learning a length generalizing bit exact algorithm.

Basically what you shown is the model can describe the algorithm. It doesn't show it can execute it at scale. Without writable state and bit exact ops, errors grow with length and "focus more" only slows that failure, it doesn’t eliminate it.

kovek · 2025-10-25T08:01:16 1761379276

> It doesn't show it can execute it at scale. Without writable state and bit exact ops,

Well, modern LLM coding agent products (eg. Claude Code) are able to store state in files in the current repository. So, you could have the model keep the "CPU State", and the files in the repository be the "RAM".

Also, could this https://arxiv.org/html/2402.17764v1 possibly reduce errors when doing inference? There is no floating point operations

razodactyl · 2025-10-26T13:38:35 1761485915

It seems to be the conclusion that we come to though, we ourselves use tools.

The focus here is the LLM being able to do it unaided.

The space of all combinations of steps is so large for many problems that require precision and usually one incorrect step breaks everything. "I forgot to carry the 1".

Even then, while brilliant, Claude does screw up sometimes - we're not there yet but it doesn't prevent it from being adequately useful.

kovek · 2025-10-17T18:30:19 1760725819

There's many different definitions of "AGI" that people come up with, and some include dreaming, quantum world, creativity, and some do not.

kovek · 2025-10-15T18:31:05 1760553065

With cursor you can hit Cmd+K in the terminal and give a prompt for the agent to convert to a command in the terminal. Would be good if it could allow to do the same to generate SQL queries based on the databases schemas available. Then it would be a generic solution that would cover this use case.

kovek · 2025-10-02T05:48:06 1759384086

> a bit whiney

> owning your problems

What do you mean? That makes me think about how people react to others having depression symptoms, saying that they should "just" get better... The best course of action is to ask for help.

Spivak · 2025-10-02T13:49:49 1759412989

When you have a condition that alters behavior I think it's pretty fair to blame that condition for said behavior changes/differences. People are so weird about mental disorders in a way they would never be about physical disorders.

Own your chronic fatigue and deal with it, quit blaming it for your tiredness!

mayhemducks · 2025-10-02T15:42:57 1759419777

Here's my genuine and honest question: What does "owning your chronic fatigue" look like in practice?

Having the knowledge about how your own mind and body work is essential when it comes to dealing with the challenges you are presented with. Having a diagnosis of some kind doesn't let you off the hook. But it is comforting to know that it isn't your fault. You aren't a lesser person because of it - you are just going through the game of life on a different level of difficulty than you expected, and a different level of difficulty than someone without that same challenge.

Spivak · 2025-10-02T17:10:12 1759425012

It would be exactly the same thing people expect from people with mental disorders. You must manage your disorder in such a manner that to the outside world it appears as though you don't have it. Which is totally not an unreasonable ask and definitely not exhausting and untenable to have to fight against your own body for 16 hours a day.

You can't blame your disorder so no turning down plans because you don't have the energy today—better take another dose of stimulants and power through! And don't you dare ask for or expect any kind of accommodation because that's just using your diagnosis as an excuse to be lazy.

mayhemducks · 2025-10-02T17:26:42 1759426002

This is one of the most under-discussed hardships about the reality of living with nearly invisible disabilities. The expectation that it remain invisible at all times is hard to live with.

If you care about people who have disabilities, give them grace when the facade slips.

kovek · 2025-08-07T17:57:29 1754589449

Could you share some sources that show this to be true?

themafia · 2025-08-07T19:02:51 1754593371

Wild mice do not get AD. Even if you let them achieve old age they do not develop the same brain plaques or tangles that are linked to Alzheimers.

Even if they did you'd have to run huge samples then do post testing necropsies to see which mice had AD which which didn't, then filter your data, then try to find results in what remains.

Otherwise you can inject the mice with a chemical known to cause AD, which is not reliable on it's own, so you can get genetically modified mice which express _some_ of the known plaques and misfolds that are associated with human AD.

Animal testing is still, largely, a very unethical and cruel affair. AD testing in mice is especially fraught with hazard.

woeirua · 2025-08-08T02:52:16 1754621536

If you believe the paper, the authors were able to create symptoms and plaques similar to AD just by reducing lithium levels in the diet of these mice.

xkcd-sucks · 2025-08-07T20:01:35 1754596895

It's like kind of challenging to prove this kind of negative, and the supposed proof here comprises no more than pedigreed words on a page, but here consider the section "What constitutes a good model for AD?": https://sci-hub.se/https://www.nature.com/articles/s41583-01...

j_bum · 2025-08-07T18:46:24 1754592384

Not a source, but the fact that we can treat AD in mice but not humans should demonstrate OPs point sufficiently.

kovek · 2025-08-02T23:03:48 1754175828

Why are you socializing with humans on Hacker News right now?

rogerrogerr · 2025-08-02T23:51:47 1754178707

Wait you’re _humans_?! I thought the AI had taken everyone’s jobs, surely it started with the HN commenter positions!

jachee · 2025-08-03T18:33:24 1754246004

    <meme>

Wait, you guys are getting paid?

    </meme>