Hacker Newsnew | past | comments | ask | show | jobs | submit | sunir's commentslogin

It’s a simple math problem. And it is also Conway’s law that says all software design follows the organization that built it—that is all software design is political.

A framework calls you. You call a library.

A framework constrains the program. A library expands the program.

It’s easier to write a library that is future proofed because it just needs to satisfy its contract.

It’s harder to write a framework because it imposes a contract on everything that depends on it.

Just like it is hard to write tort law without a lot of jurisprudence to build out experience and test cases, it is hard to write a framework from only one use case.

No one likes lawyers because they block you from doing what you want. This is the problem with frameworks.

However the government likes laws because they block you from doing what you want. Same with whomever is directing engineering that wants all other programmers to work in a consistent way.


Technically everything you have written is true. But the proliferation of frameworks is almost a self-reinforcing antipattern.

> No one likes lawyers because they block you from doing what you want.

Or even doing what you need to do.

Certainly, to the extent that a mini-framework is composed of more constraints piled on top of an extant bigger framework, mini-frameworks are, like swimming pools, attractive nuisances. "Hey, look, guys! This is so much simpler!"

> It’s harder to write a framework because it imposes a contract on everything that depends on it.

Judging by what people write and use, I'm not sure this is _exactly_ true. Sure, writing a _good_ framework or library is hard, but people accept piss-poor frameworks, and accept libraries that were designed to work in conjunction with a single framework.

> It’s easier to write a library that is future proofed because it just needs to satisfy its contract.

But the thing is that the library itself defines the contract, and it might be a piss-poor one for many applications.

There is some excellent code out there, and there is a lot of shitty code out there. I think the problem is social; too many people want to write code that is in charge. Now, maybe it's somewhat technical, in that they have used things that are in charge, and they were too big (leading to the mini-framework of the article) or they were otherwise not great, so this leads to yet another framework (cue standards xkcd cartoon) because they realize they need something in charge, but aren't happy with their current options.

And, of course, since the frameworks they know take a kitchen sink mentality, their new framework does as well. (Maybe it's a smaller sink, but everything needed is still shoved in there.) So there are yet more libraries that are tied to yet another framework.

Because writing good libraries that are completely framework independent _can_ be as challenging as writing a good framework. And when someone decides they need a new framework, they are focused on that, and making it work well, and since that drives their thought process, everything else the write gets shoved into the framework.


Thank you. I thought I was going crazy reading the article which doesn’t connect open and close parenthesis :: higher and lower precedence :: indent and outdent :: +1 and -1 and just flip it around to get the opposing polarity.

A real Wesley Crusher moment.


Not necessarily controlling stakes.


You’re right and wrong at the same time. A quantum superposition of validity.

The word thinking is going too much work in your argument, but arguably “assume it’s thinking” is not doing enough work.

The models do compute and can reduce entropy; however, they don’t match the way we presume things do this because we assume every intelligence is human or more accurately the same as our own mind.

To see the algorithm for what it is, you can make it work through a logical set of steps from input to output but it requires multiple passes. The models use a heuristic pattern matching approach to reasoning instead of a computational one like symbolic logic.

While the algorithms are computed, the virtual space the input is transformed to the output is not computational.

The models remain incredible and remarkable but they are incomplete.

Further there is a huge garbage in garbage out problem as often the input to the model lacks enough information to decide on the next transformation to the code base. That’s part of the illusion of conversationality that tricks us into thinking the algorithm is like a human.

AI has always had human reactions like this. Eliza was surprisingly effective, right?

It may be that average humans are not capable of interacting with an AI reliably because the illusion is overwhelming for instinctive reasons.

As engineers we should try to accurately assess and measure what is actually happening so we can predict and reason about how the models fit into systems.


Growth curves mean nothing if you're selling $0.90 dollars. You have to show a growth curve when price > cost. It's not even clear that value > cost.

I absolutely love Anthropic; but I am worried about the fiscal wall they will hit that will ratchet up my opex as they will need to steeply raise prices.


So the critical question here really is whether they are selling API access to their models for less than the unit cost it takes to serve them.


I don’t think it is only that consideration

While Gross margins numbers are estimates vary widely, 40-60% numbers some analysts throw around seems realistic.

In an equity only company that is good enough metric , but all the major players have long since now transitioned to also raising debt.

The debt would need to be serviced even if fresh training investments stopped fully .

The cost of debt servicing would depend on the interest rates and the economy etc inaddition to the risk of the debt itself.

Quite possible that model companies would need to jack prices even with good gross margins to handle their debt load.


You have to include the carrying cost per customer as well which is mostly labour. Most of SaaS undercounts the payroll attached to a subscription which is why it is so hard to get to positive net margins and maintain lifetime value.

I am sceptical an LLM foundation model company can get away with low human services either directly on its own payroll or by giving up margin to a channel of implementation partners. Thats because the go to market requires organizational change on the customer sites. That is a lot of human surface area.


But inference is cheap! If they stop doing everything and become Inference Inc., they'll be profitable.


Until China drops another open weight model you can run yourself at cost price.


Even if their introspection within the inference step is limited, by looping over a core set of documents that the agent considers itself, it can observe changes in the output and analyze those changes to deduce facts about its internal state.

You may have experienced this when the llms get hopelessly confused and then you ask it what happened. The llm reads the chat transcript and gives an answer as consistent with the text as it can.

The model isn’t the active part of the mind. The artifacts are.

This is the same as Searles Chinese room. The intelligence isn’t in the clerk but the book. However the thinking is in the paper.

The Turing machine equivalent is the state table (book, model), the read/write/move head (clerk, inference) and the tape (paper, artifact).

Thus it isn’t mystical that the AIs can introspect. It’s routine and frequently observed in my estimation.


This seems to be missing the point? What you're describing is the obvious form of introspection that makes sense for a word predictor to be capable of. It's the type of introspection that we consider easy to fake, the same way split-brained patients confabulate reasons why the other side of their body did something. Once anomalous output has been fed back into itself, we can't prove that it didn't just confabulate an explanation. But what seemingly happened here is the model making a determination (yes or no) on whether a concept was injected in just a single token. It didn't do this by detecting an anomaly in its output, because up until that point it hadn't output anything - instead, the determination was derived from its internal state.


I have to admit I am not really understanding what this paper is trying to show.

Edit: Ok I think I understand. The main issue I would say is this is a misuse of the word "introspection".


I think it’s perfectly clear: the model must know it’s been tampered with because it reports tampering before it reports which concept has been injected into its internal state. It can only do this if it has introspection capabilities.


Sure I agree what I am talking about is different in some important ways; I am “yes and”ing here. It’s an interesting space for sure.

Internal vs external in this case is a subjective decision. Where there is a boundary, within it is the model. If you draw the boundary outside the texts then the complete system of model, inference, text documents form the agent.

I liken this to a “text wave” by metaphor. If you keep feeding in the same text into the model and have the model emit updates to the same text, then there is continuity. The text wave propagates forward and can react and learn and adapt.

The introspection within the neural net is similar except over an internal representation. Our human system is similar I believe as a layer observing another layer.

I think that is really interesting as well.

The “yes and” part is you can have more fun playing with the models ability to analyze their own thinking by using the “text wave” idea.


> This is the same as Searles Chinese room. The intelligence isn’t in the clerk but the book. However the thinking is in the paper.

This feels like a misrepresentation of the "Chinese Room" thought experiment. That the "thinking" isn't the clerk nor the book; it's the entire room itself.


The word decimate is sitting right there.


And in this case actually correct! Decimate is often used to mean “almost wipe out”, but the word actually comes from “killing every 10th person”.. I.e 10% of a group.


eviscerated!


If the plan is too big to fit into context or requires too much attention it overwhelms the llm. You need to decompose into tasks and todos aggressively.


We certainly will; they can’t replace humans in most language tasks without having a human like emotional model. I have a whole therapy set of agents to debug neurotic long lived agents with memory.


Ok, call me crazy, but I don't actually think there's any technical reason that a theoretical code generation robot needs emotions that are as fickle and difficult to manage as humans.

It's just that we designed this iteration of technology foundationally on people's fickle and emotional reddit posts among other things.

It's a designed-in limitation, and kind of a happy accident it's capable of writing code at all. And clearly carries forward a lot of baggage...


If you can find enough training data that does human-like things without have human-like qualities, we are all ears.


It can be simultaneously the best we have, and well short of the best we want. It can be a remarkable achievement and fall short of the perceived goals.

That's fine.

Perhaps we can RL away some of this or perhaps there's something else we need. Idk, but this is the problem when engineers are the customer, designer, and target audience.


Quality Spock pun.


Maybe. I use QWAN frequently when working with the coding agents. That requires an llm equivalent of interoception to recognize when the model understanding is scrambled or “aligned with itself” which is what qwan is.


what on God's green Earth could the CEO of a no name b2b saas have a use for long running agents?

either your business isn't successful, so you're coding when you shouldn't be, or cosplaying coding with Claude, or you're lying, or you're telling us about your expensive and unproductive hobby.

How much do you spend on AI? What's your annual profit?

edit: oh cosplaying as a CEO. I see. Nice WPEngine landing page Mr AppBind.com CEO. Better have Claude fix your website! I guess that agent needs therapy...


I like writing software.

I hate managing people.

What are we doing?


I long ago accepted a career in B2B software meant my job was to put people out of work. And as it turns out, programmers always start by putting other programmers out of work.

Programmers and Managers by Kraft (1977) is my favourite book on the subject. It's unabashedly Marxist and from the very early days of the industry, which tickles me, since that is a different way of thinking than I usually do.

https://www.amazon.com/Programmers-Managers-Routinization-Pr...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: