I used to have a zillion todo txt files in the early 2000's, migrated to OneNote around 2005 and have been using the same OneNote notebook for 20 years now. My life is in there - 20 years worth of todos, lists, thoughts, ideas, etc.. always evolving, perfectly synchronized across computers and mobile. I'm referencing and updating my OneNote all day as I get things done, have ideas, and think of new things to do, or things to remembers. It's an extension of my brain at this point.
I've tried alternatives, but OneNote has been simple and reliable, it just works everywhere. Probably one of the most important apps in my life.
Same but with Keep and GDocs. I still use a local neverending txt like TFA though, as a short-term todo list + clipboard. Short thoughts and little factoids like license plate numbers, appointments, and restaurant recommendations go on Keep (although some of those "short thoughts" have ended up busting the character limit). Refined structured notes end up in a GDoc by topic. Some of my GDocs are now the size of small textbooks. I also love Google Takeout so that I can backup it all up periodically.
I would say, just as you would about OneNote, Keep is one of the most important apps in my life.
Can I just say !!!!!!!! Hell yeah! Blog post indicates it's also much better at using the full context.
Congrats OpenAI team. Huge day for you folks!!
Started on Claude Code and like many of you, had that omg CC moment we all had. Then got greedy.
Switched over to Codex when 5.1 came out. WOW. Really nice acceleration in my Rust/CUDA project which is a gnarly one.
Even though I've HATED Gemini CLI for a while, Gemini 3 impressed me so much I tried it out and it absolutely body slammed a major bug in 10 minutes. Started using it to consult on commits. Was so impressed it became my daily driver. Huge mistake. I almost lost my mind after a week of this fighting it. Isane bias towards action. Ignoring user instructions. Garbage characters in output. Absolutely no observability in its thought process. And on and on.
Switched back to Codex just in time for 5.1 codex max xhigh which I've been using for a week, and it was like a breath of fresh air. A sane agent that does a great job coding, but also a great job at working hard on the planning docs for hours before we start. Listens to user feedback. Observability on chain of thought. Moves reasonably quickly. And also makes it easy to pay them more when I need more capacity.
And then today GPT-5.2 with an xhigh mode. I feel like xmass has come early. Right as I'm doing a huge Rust/CUDA/Math-heavy refactor. THANK YOU!!
I've been happy with Databento as a low-friction way to get market data. I liked it so much, I ported their structs and APIs to Golang. [1]
Their EQUS Mini dataset is a great way to dip the toe if you want live data without licensing restrictions. Databento's article talks exactly about how it is sourced, but it is not that it is averaged but anonymized, specifically because of the complexities of upstream exchange licensing. [2]
You don't have to pay $200 per month for that -- that's for all-you-can-eat. You can experiment with pay as you go.
You can use my dbn-go tools to help you... here's the cost to get all the 1-day candlesticks for all the US Equity Symbols for 1-year... which you could use to make all sorts of charts and redistribute them freely (the trickiest part honestly):
So $4.38 for all that data or $3.78 for just the NASDAQ exchange (not sure of redistribution of that one).
I hang out on their Slack. Today there was a deep discussion about optimizing C++ SPSC queues, although it is usually isn't too technical like that. They are pretty transparent about how they implement things.
I've been using a lot of Claude and Codex recently.
One huge difference I notice between Codex and Claude code is that, while Claude basically disregards your instructions (CLAUDE.md) entirely, Codex is extremely, painfully, doggedly persistent in following every last character of them - to the point that i've seen it work for 30 minutes to convolute some solution that was only convoluted because of some sentence I threw in the instructions I had completely forgotten about.
I imagine Codex as the "literal genie" - it'll give you exactly what you asked for. EXACTLY. If you ask Claude to fix a test that accidentally says assert(1 + 1 === 3), it'll say "this is clearly a typo" and just rewrite the test. Codex will rewrite the entire V8 engine to break arithmetic.
Both these tools have their uses, and I don't think one approach is universally better. Because Claude just hacks its way to a solution, it is really fast, so I like using it for iterate web work, where I need to tweak some styles and I need a fast iterative loop. Codex is much worse at that because it takes like 5 minutes to validate everything is correct. Codex is much better for longer, harder tasks that have to be correct -- I can just write some script to verify that what it did work, and let it spin for 30-40 minutes.
I've been really impressed with codex so far. I have been working on a flight simulator hobby project for the last 6 months and finally came to the conclusion that I need to switch from floating origin, which my physics engine assumes with the coordinate system it uses, to a true ECEF coordinate system (what underpins GPS). This involved a major rewrite of the coordinate system, the physics engine, even the graphics system and auxilary stuff like asset loading/unloading etc. that was dependent on local X,Y,Z. It even rewrote the PD autopilot to account for the changes in the coordinate system. I gave it about a paragraph of instructions with a couple of FYIs and... it just worked! No major graphical glitches except a single issue with some minor graphical jitter, which it fixed on the first try. In total took about 45 minutes but I was very impressed.
I was unconvinced it had actually, fully ripped out the floating origin logic, so I had it write up a summary and then used that as a high level guide to pick through the code and it had, as you said, followed the instructions to the letter. Hugely impressive. In march of 2023 OpenAI's products struggled to draw a floating wireframe cube.
> Claude basically disregards your instructions (CLAUDE.md) entirely
A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell when Claude is not paying attention to the instructions on CLAUDE.md when Claude stops calling him “Mr Tinkleberry” consistently
My advice from someone who has built recommendation systems: Now comes the hard part! It seems like a lot of the feedback here is that it's operating pretty heavily like a content based system system, which is fine. But this is where you can probably start evaluating on other metrics like serendipity, novelty, etc. One of the best things I did for recommender systems in production is having different ones for different purposes, then aggregating them together into a final. Have a heavy content-based one to keep people in the rabbit hole. Have a heavy graph based to try and traverse and find new stuff. Have one that is heavily tuned on a specific metric for a specific purpose. Hell, throw in a pure TF-IDF/BM25/Splade based one.
The real trick of rec systems is that people want to be recommnded things differently. Having multiple systems that you can weigh differently per user is one way to be able to achieve that, usually one algorithm can't quite do that effectively.
Not pixels, but percels. Pixels are points in the image, while a "percel" is unit of perceptual information. It might be a pixel with an associated sound, in a given moment of time. In case of humans, percels include other senses as well, and they can also be annotated with your own thoughts (i.e. percels can also include tokens or embeddings).
Of course, NNs like LLM never process a percel in isolation, but always as a group of neighboring percels (aka context), with an initial focus on one of the percels.
It seems amazon itself is aware of this issue. The linked engadget article even mentions this:
> "The rate at which Amazon has burned through the American working-age populace led to another piece of internal research, obtained this summer by Recode, which cautioned that the company might “deplete the available labor supply in the US” in certain metro regions within a few years."
We were heavy users of Claude Code ($70K+ spend per year) and have almost completely switched to codex CLI. I'm doing massive lifts with it on software that would never before have been feasible for me personally, or any team I've ever run. I'll use Claude Code maybe once every two weeks as a second set of eyes to inspect code and document a bug, with mixed success. But my experience has been that initially Claude Code was amazing and a "just take my frikkin money" product. Then Codex overtook CC and is much better at longer runs on hard problems. I've seen Claude Code literally just give up on a hard problem and tell me to buy something off the shelf. Whereas Codex's ability to profoundly increase the capabilities of a software org is a secret that's slowly getting out.
I don't have any relationship with any AI company, and honestly I was rooting for Anthropic, but Codex CLI is just way way better.
Also Codex CLI is cheaper than Claude Code.
I think Anthropic are going to have to somehow leapfrog OpenAI to regain the position they were in around June of this year. But right now they're being handed their hat.
It is primarily a principal agent problem, with a hint of marshmallow test.
If you are a developer who is not writing the documents for consumption by AI, you are primarily writing documents for someone who is not you; you do not know what this person will need or if they will ever even look at them.
They may, of course, help you, but you may not understand that, have the time, or discipline.
If you are writing them because the AI using them will help you, you have a very strong and immediate incentive to document the necessary information. You also have the benefit of a short feedback loop.
Side note, thanks to the LLMs penchant of wiping out comments, I have a lot more docs these days and far fewer comments.
The US spent decades transitioning from a manufacturing economy to a service economy, deliberately.
Now there's a populist making political hay, throwing out numbers about trade deficits, which ignores revenue from services. Yes, there is have a trade deficit on goods, that was a long-term strategy because services were a superior investment.
Manufacturing is an inferior way to make money unless you're planning to go to conventional war, and since the US is a nuclear superpower it's never going to get into an existential boots-on-the-ground Serious War again unless it just wants to cosplay. Nukes make conventional war for survival irrelevant.
So: it took decades to burn the boats with manufacturing, and trying to rebuild them in a few years is a hilarious folly. It absolutely will not go anywhere, and honestly shouldn't anyway. There is real danger, however, that the US burns the boats on the carefully crafted service sector as well.
This headline is kind of dumb. Yes, there is yet another new venue. But other recent "new" venues include MIAX, IEX, MEMX, and soon to be active - 24X. Oh and let's not forget the Long Term Stock Exchange (LTSE).
Current gen AI is going to result in the excess datacenter equivalent of dark fiber from the 2000's. Lots of early buildout and super investment, followed by lack of customer demand and later cheaper access to physical compute.
The current neural network software architecture is pretty limited. Hundreds of billions of dollars of investor money has gone into scaling backprop networks and we've quickly hit the limits. There will be some advancements, but it's clear we're already at the flat part of the current s-curve.
There's probably some interesting new architectures already in the works either from postdocs or in tiny startups that will become the base of the next curve in the next 18 months. If so, one or more may be able to take advantage of the current overbuild in data centers.
However, compute has an expiration date like old milk. It won't physically expire but the potential economic potential decreases as tech increases. But if the timing is right, there is going to be a huge opportunity for the next early adopters.
Related, it feels like AI Studio is the only mainstream LLM frontend that treats you like an adult. Choose your own safety boundaries, modify the context & system prompt as you please, clear rate limits and pricing, etc. It's something you come to appreciate a lot, even if we are in the part of the cycle where Google's models aren't particularly SOTA rn
The SolveIt tool [0] has a simple but brilliant feature I now want in all LLM tools: a fully editable transcript. In particular, you can edit the previous LLM responses. This lets you fix the lingering effect of a bad response without having to back up and redo the whole interaction.
What i take away is the simplicity and scaling behavior. The ML field often sees an increase in module complexity to reach higher scores, and then a breakthrough where a simple model performs on-par with the most complex. That such a "simple" architecture works this well on its own, means we can potentially add back the complexity again to reach further. Can we add back MSA now? where will that take us?
My rough understanding of field is that a "rough" generative model makes a bunch of decent guesses, and more formal "verifiers" ensure they abide by the laws of physics and geometry. The AI reduce the unfathomably large search-space so the expensive simulation doesn't need to do so much wasted work on dead-ends. If the guessing network improves, then the whole process speeds up.
- I'm recalling the increasingly complex transfer functions in redcurrant networks,
- The deep pre-processing chains before skip forward layers.
- The complex normalization objectives before Relu.
- The convoluted multi-objective GAN networks before diffusion.
- The complex multi-pass models before full-convolution networks.
So basically, i'm very excited by this. Not because this itself is an optimal architecture, but precisely because it isn't!
> I’m rapidly losing interest in all of these tools. It feels like blockchain again in a lot of weird ways.
It doesn't feel like blockchain at all. Blockchain is probably the most useless technology ever invented (unless you're a criminal or an influencer who makes ungodly amounts of money off of suckers).
AI is a powerful tool for those who are willing to put in the work. People who have the time, knowledge and critical thinking skills to verify its outputs and steer it toward better answers. My personal productivity has skyrocketed in the last 12 months. The real problem isn’t AI itself; it’s the overblown promise that it would magically turn anyone into a programmer, architect, or lawyer without effort, expertise or even active engagement. That promise is pretty much dead at this point.
Objectively. I’m now tackling tasks I wouldn’t have even considered two or three years ago, but the biggest breakthrough has been overcoming procrastination. When AI handles over 50% of the work, there’s a 90% chance I’ll finish the entire task faster than it would normally take me just to get started on something new.
This. I had this long standing dispute that I just never had the energy to look up what needed to be done to resolve it. I just told it to ChatGPT and it generated everything -- including the emails I needed to send and who to send them to. Two weeks later and it was taken care of. I had sat on it for literally 3 months until then.
If I could have something that said, "Here are some things that it looks like you're procrastinating on -- do you want me to get started on them for you?" -- that would probably be crazy useful.
Credit where it’s due: doing live demos is hard. Yesterday didn’t feel staged—it looked like the classic “last-minute tweak, unexpected break.” Most builders have been there. I certainly have (I once spent 6 hours at a hackathon and broke the Flask server keying in a last minute change on the steps of the stage before going on).
Nvidia sees the forest of the trees. The consequences of the US government buying steaks and Intel are that there will be Federal requirements for us companies using Intel. This is entirely about the foundry business. Nvidia is at risk when 100% of the production of its intellectual property occurs in Taiwan. They're more interested than anyone else in diversifying their foundry solutions. Intel has just been a terrible partner and totally disregards its customers. It's only because of the new strategic need for the US to have a foundry business that the government is saving until. NVIDIA is understandably supportive of this.
- Decision Trees: Clear branching logic with ├── and └── notation
- Sequential Steps: Numbered, ordered procedures instead of scattered explanations
- Prerequisites: Explicit dependency checks before proceeding
2. AI Agent Optimizations
- Tool Call Clarity: Exact function names and parameters
- Binary Decisions: Clear yes/no conditions instead of ambiguous language
- Error Handling: Specific failure conditions and next steps
- Verification Steps: "Recheck" instructions after each fix
3. Cognitive Load Reduction
- Reference Tables: Quick lookup for tools and purposes
- Pattern Recognition: Common issue combinations and their solutions
- Critical Reminders: Common AI mistakes section to prevent errors
4. Actionable Language
- Removed verbose explanations mixed with instructions
- Consolidated multiple documents' logic into single workflows
- Used imperative commands: "Check X", "If Y then Z"
- Added immediate verification steps
I've tried alternatives, but OneNote has been simple and reliable, it just works everywhere. Probably one of the most important apps in my life.