More

diwank · 2025-11-01T21:57:40 1762034260

Just in time UI is incredibly promising direction. I don't expect (in the near term) that entire apps would do this but many small parts of them would really benefit. For instance, website/app tours could be just generated atop the existing ui.

diwank · 2025-10-15T18:01:27 1760551287

I am a bit mind boggled by the pricing lately, especially since the cost increased even further. Is this driven by choices in model deployment (unquantized etc) or simply by perceived quality (as in 'hey our model is crazy good and we are going to charge for it)?

diwank · 2025-09-09T09:04:43 1757408683

Remarkable is really lagging behind on this. I was thinking of finally biting the bullet and writing an app for the Paper Pro. Any ideas/takers?

arbayi · 2025-09-09T09:16:33 1757409393

I’ve even thought about building my own DIY e-ink reader/note-taker for this.

How does apps for Paper Pro works?

diwank · 2025-09-03T05:35:32 1756877732

Google's response:

"Read our statement on today’s decision in the case involving Google Search."

https://blog.google/outreach-initiatives/public-policy/doj-s...

diwank · 2025-08-21T03:36:44 1755747404

Agreed. The fact that it has any structure at all is fascinating (and super pretty). Could signal at interesting internal structures. I would love to see a version for Qwen-3 and Mistral too!

I wonder if being trained on significant amounts of synthetic data gave it any unique characteristics.

diwank · 2025-08-15T15:10:02 1755270602

also ettin is a new favorite and a solid alternative: https://huggingface.co/jhu-clsp/ettin-encoder-1b

I'd encourage you to give setfit a try, along with aggressively deduplicating your training set, finding top ~2500 clusters per label, and using setfit to train multilabel classifier on that.

Either way- would love to know what worked for you! :)

diwank · 2025-08-14T15:53:34 1755186814

yup. I started a fully autonomous, 100% vibe coded side project called steadytext, mostly expecting it to hit a wall, with LLMs eventually struggling to maintain or fix any non-trivial bug in it. turns out I was wrong, not only has claude opus been able to write up a pretty complex 7k LoC project with a python library, a CLI, _and_ a postgres extension. It actively maintains it and is able to fix filed issues and feature requests entirely on its own. It is completely vibe coded, I have never even looked at 90% of the code in that repo. it has full test coverage, passes CI, and we use it in production!

granted- it needs careful planning for CLAUDE.md and all issues and feature requests need a lot of in-depth specifics but it all works. so I am not 100% convinced by this piece. I'd say it's def not easy to get coding agents to be able to manage and write software effectively and specially hard to do so in existing projects but my experience has been across that entire spectrum. I have been sorely disappointed in coding agents and even abandoned a bunch or projects and dozens of pull requests but I have also seen them work.

you can check out that project here: https://github.com/julep-ai/steadytext/

sjdbdjskbzba · 2025-08-14T17:14:50 1755191690

> It is completely vibe coded, I have never even looked at 90% of the code in that repo. it has full test coverage, passes CI, and we use it in production!

This horrifies me. I checked your website and all your recommendations are from people who appear to have an Indian background, but you’re based in the US? And you claim they’re the most innovative companies yet I doubt anyone has heard of them?

Looking over the repo and it seems like a mess (commits are meaningless and code is all over the place).

I’m sorry this feels incredibly scammy.

thegeomaster · 2025-08-15T12:16:14 1755260174

Thanks for sharing this! It's difficult to find good examples of useful codebases where coding agents have done most of the work. I'm always actively looking at how I can push these agents to do more for me and it's very instructive to hear from somebody who has had success on this level. (Would be nice to read a writeup, too)

diwank · 2025-08-15T14:54:58 1755269698

It's coming soon! I think this experiment has really taught me a lot about the limits of agentic code assistants, stuff that they're good at, they're insanely good at, and stuff that they're horrible at and cannot seem to overcome. I did write a little bit about how I use Claude Code [1] before I started this project a while back, and I'm planning to finish a sequel pretty soon.

^[1]: https://diwank.space/field-notes-from-shipping-real-code-wit...

aethrum · 2025-08-14T17:19:17 1755191957

Huh, interesting. Though I do wonder if the best possible thing an AI could help code would be another AI tool

itsalotoffun · 2025-08-15T12:29:00 1755260940

This way to the hard take-off.

diwank · 2025-08-05T15:39:14 1754408354

> Future robots may learn in their dreams...

So prescient. I definitely think this will be a thing in the near future ~12-18 months time horizon

casenmgreen · 2025-08-05T16:27:22 1754411242

I may be wrong, but this seems to make no sense.

A neural net can produce information outside of its original data set, but it is all and directly derived from that initial set. There are fundamental information constraints here. You cannot use a neural net to itself generate from its existing data set wholly new and original full quality training data for itself.

You can use a neural net to generate data, and you can train a net on that data, but you'll end up with something which is no good.

scarmig · 2025-08-05T19:42:02 1754422922

Humans are dependent on their input data (through lifetime learning and, perhaps, information encoded in the brain from evolution), and yet they can produce out of distribution information. How?

There is an uncountably large number of models that perfectly replicate the data they're trained on; some generalize out of distribution much better. Something like dreaming might be a form of regularization: experimenting with simpler structures that perform equally well on training data but generalize better (e.g. by discovering simple algorithms that reproduce the data equally well as pure memorization but require simpler neural circuits than the memorizing circuits).

Once you have those better generalizing circuits, you can generate data that not only matches the input data in quality but potentially exceeds it, if the priors built into the learning algorithm match the real world.

stavros · 2025-08-05T20:11:27 1754424687

Humans produce out-of-distribution data all the time, yet if you had a teacher making up facts and teaching them to your kids, you would probably complain.

scarmig · 2025-08-05T20:23:55 1754425435

Humans also sometimes hallucinate and produce non-sequitors.

suddenlybananas · 2025-08-05T20:39:00 1754426340

Maybe you do, but people don't "hallucinate". Lying or being mistaken is a very different thing.

delusional · 2025-08-05T20:07:29 1754424449

Computers aren't humans.

We have truly reached peak hackernews here.

neom · 2025-08-05T16:46:53 1754412413

I might be misunderstanding your comment so sorry if so. Robots have sensors and RL is a thing, they can collect real world data and then processing and consolidating real world experiences during downtime (or in real time), running simulations to prepare for scenarios, and updating models based on the day's collected data. The way I saw it that I thought was impressive was the robot understood the scene, but didn't know how the scene would respond to it's actions, so it gens videos of the possible scenarios, and then picks the best ones and models it's actuation based on it's "imagination".

thecupisblue · 2025-08-05T21:39:38 1754429978

This is definitely one of the potential issues that might happen to embodied agents/robots/bodies trained on the "world model". As we are training a model for the real world based on a model that simulates the real world, the glitches in the world simulator model will be incorporated into the training. There will be edge cases due to this layered "overtraining", where a robot/agent/body will expect Y to happen but X will happen, causing unpredictable behaviour.I assume that a generic world agent will be able to autocorrect, but this could also lead to dangerous issues.

I.e. if the simulation has enough videos of firefighters breaking glass where it seems to drop instantaneously and in the world sim it always breaks, a firefighter robot might get into a problem when confronted with unbreakable glass, as it expects it to break as always, leading to a loop of trying to shatter the glass instead of performing another action.

kannanvijayan · 2025-08-06T14:09:13 1754489353

The benefit of these AI-generated simulation models as a training mechanism is that it helps add robustness without requiring a large training set. The recombinations can generate wider areas of the space to explore and learn with but using a smaller basis space.

To pick an almost trivial example, let's say OCR digit recognition. You'll train on the original data-set, but also on information-preserving skews and other transforms of that data set to add robustness (stretched numbers, rotated numbers, etc.). The core operation here is taking a smallset in some space (original training data) and producing some bigset in that same space (generated training data).

For simple things like digit recognition, we can imagine a lot of transforms as simple algorithms, but one can consider more complex problems and realize that an ML model would be able to do a good job of learning how to generate bigset candidates from the smallset.

schmidtleonard · 2025-08-05T16:43:33 1754412213

We are miles away from the fundamental constraint. We know that our current training methodologies are scandalously data inefficient compared to human/animal brains. Augmenting observations with dreams has long been theorized to be (part of) the answer.

vanviegen · 2025-08-05T21:46:59 1754430419

> current training methodologies are scandalously data inefficient compared to human/animal brains

Are you sure? I've been ingesting boatloads of high definition multi-sensory real-time data for quite a few decades now, and I hardly remember any of it. Perhaps the average quality/diversity of LLM training data has been higher, but they sure remember a hell of a lot more of it than I ever could.

roenxi · 2025-08-06T10:14:45 1754475285

It is possible - for example, getting a blob of physics data, fitting a curve then projecting the curve to theorise what would happen in new unseen situations. The information constraints don't limit the ability to generate new data in a specific domain from a small sample; indeed it might be possible to fully comprehend the domain if there is an underlying process it can infer. It is impossible to come up with wildly unrelated domains though.

fc417fc802 · 2025-08-06T06:20:49 1754461249

Approximately speaking, you have a world model and an agent model. You continue to train the world model using data collected by the robot day-to-day. The robot "dreams" by running the agent model against the world model instead of moving around in the real world. Dreaming for thousands of (simulated) hours is much more efficient than actually running the physical hardware for thousands of wall clock hours.

Demplolo · 2025-08-05T19:27:23 1754422043

I actually think you can.

The LLM has plenty of experts and approaches etc.

Give it tool access let it formulate it's own experiments etc.

The only question here is if it becomes a / the singularity because of this, gets stuck in some local minimum or achieves random perfection and random local minimum locations.

tim333 · 2025-08-05T21:25:42 1754429142

Humans can learn from visualising situations and thinking through different scenarios. I don't see why AI / robots can't do similar. In fact I think quite a lot of training for things like Tesla self driving is done in simulation.

hnuser123456 · 2025-08-05T16:52:33 1754412753

It's feasible you could have a personal neural net that fine-tunes itself overnight to make less inference mistakes in the future.

robotresearcher · 2025-08-06T17:19:56 1754500796

AlphaGo would seem to be a conceptually simple counter example.

exe34 · 2025-08-05T18:26:15 1754418375

Any idea how humans do it? Where do they get novel information from?

dingnuts · 2025-08-05T16:16:25 1754410585

what is a robot dream when there is clearly no consciousness?

What's with this insane desire for anthropomorphism? What do you even MEAN learn in its dreams? Fine-tuning overnight? Just say that!

gavinray · 2025-08-05T16:40:51 1754412051

  > What's with this insane desire for anthropomorphism?

Devil's advocate: Making the assumption that consciousness is uniquely human, and that humans are "special" is just as ludicrous.

Whether a computational medium is carbon-based or silicon-based seems irrelevant. Call it "carbon-chauvinism".

mandolingual · 2025-08-05T17:59:34 1754416774

"Consciousness" is an overloaded thought killer that swerves all conversation into obfuscated semantic arguments. One person will be talking about 'internality' and self-image (in the testable, mechanical sense that you could argue Chain of Thought models already have in a petty way) and the other will be grappling with the concept of qualia and the ineffable nature of human experience.

bakuninsbart · 2025-08-05T17:12:26 1754413946

That's not even a devil's advocate, many other animals clearly have consciousness, at least if we're not solipsistic. There have been many very dangerous precedents in medicine where people have been declared "brain dead" only to awake and remember.

Since consciousness is closely linked to being a moral patient, it is all the more important to err on the side of caution when denying qualia to other beings.

astrange · 2025-08-06T08:50:56 1754470256

AI has traditionally been driven by "metaphor-driven development" where people assume the brain has system X, program something they give the same name, and then assume because they've given it that name it must work because it works in the brain.

This is generally a bad idea, but a few of the results like "neural networks" did work out… eventually.

"World model" is another example of a metaphor like this. They've assumed that humans have world models (most likely not true), and that if they program something and call it a "world model" it will work the same way (definitely not true) and will be beneficial (possibly true).

(The above critique comes from Phil Agre and David Chapman.)

olddustytrail · 2025-08-05T18:14:16 1754417656

Yes, and an object in OOP isn't really a physical object. And a string isn't really a thin bit of rope.

No-one cares. It's just terminology.

neom · 2025-08-05T16:11:15 1754410275

I'm invested in a startup that is doing something unrelated robotics, but they're spending a lot of time in Shenzhen, I keep a very close eye on robotics and was talking to their CTO about what he is seeing in China, versions of this are already being implemented.

dingnuts · 2025-08-05T16:17:08 1754410628

[flagged]

sureglymop · 2025-08-05T16:45:40 1754412340

I have no doubts that is the case. Just look at this new Unitree robot that was unveiled a mere 6 hours ago: https://youtu.be/ve9USu7zpLU?feature=shared

And these are consumer options, affordable to you and me, not only to some military. If those are the commonly available options... there may be way more advanced stuff that we haven't seen.

dingnuts · 2025-08-05T19:50:16 1754423416

this stuff is old tech, and has nothing to do with transformers. The Boston Dynamics style robot dogs are always shown in marketing demos like the one you linked in secretly very controlled environments. Let me know when I can order one that will bring the laundry downstairs for my wife.

I asked for real examples from someone who claimed to have first hand experience, not more marketing bullshit

DiggyJohnson · 2025-08-05T17:26:52 1754414812

https://news.ycombinator.com/newsguidelines.html

dingnuts · 2025-08-05T19:47:10 1754423230

lol I thought this was going to be the receipts but no, I guess asking for evidence is against the rules!

y'all are in a religion

DiggyJohnson · 2025-08-05T23:42:03 1754437323

I genuinely have no idea where you’re coming from on this. You’re assuming way too much about everyone you disagree with

tim333 · 2025-08-05T21:31:19 1754429479

Calling someone a liar without evidence is kinda rude.

schmidtleonard · 2025-08-05T16:46:12 1754412372

This is so standard it's part of the tutorials now

https://developer.nvidia.com/isaac/gr00t

lukeahn · 2025-08-05T17:28:13 1754414893

Gr00t works but it is not a standard yet. Can't share what I have heard, but the success rate is still far behind than other methods.

Aco- · 2025-08-05T16:16:41 1754410601

"Do Androids Dream of Electric Sheep?"

diwank · 2025-07-27T13:28:45 1753622925

I think that’s too harsh a position solely for not being peer reviewed yet. Neither of yhe original mamba1 and mamba2 papers were peer reviewed. That said, strong claims warrant strong proofs, and I’m also trying to reproduce the results locally.

diwank · 2025-07-27T08:57:29 1753606649

Exactly!

> It uses two interdependent recurrent modules: a *high-level module* for abstract, slow planning and a *low-level module* for rapid, detailed computations. This structure enables HRM to achieve significant computational depth while maintaining training stability and efficiency, even with minimal parameters (27 million) and small datasets (~1,000 examples).

> HRM outperforms state-of-the-art CoT models on challenging benchmarks like Sudoku-Extreme, Maze-Hard, and the Abstraction and Reasoning Corpus (ARC-AGI), where CoT methods fail entirely. For instance, it solves 96% of Sudoku puzzles and achieves 40.3% accuracy on ARC-AGI-2, surpassing larger models like Claude 3.7 and DeepSeek R1.

Erm what? How? Needs a computer and sitting down.

cs702 · 2025-07-27T12:56:06 1753620966

Yeah, that was pretty much my reaction. I will need time on a computer too.

The repo is at https://github.com/sapientinc/HRM .

I love it when authors publish working code. It's usually a good sign. If the code does what the authors claim, no one can argue with it!

diwank · 2025-07-27T13:30:55 1753623055

Same! Guan’s work on sample packing during finetuning has become a staple. His openchat code is also super simple and easy to understand.

mkagenius · 2025-07-27T10:51:43 1753613503

Is it talking about fine tuning existing models with 1000 examples to beat them in those tasks?