More

sivm · 2025-08-30T10:25:48 1756549548

Attention is all you need for what we have. But attention is a local heuristic. We have brittle coherence and no global state. I believe we need a paradigm shift in architecture to move forward.

ACCount37 · 2025-08-30T12:58:29 1756558709

Plenty of "we need a paradigm shift in architecture" going around - and no actual architecture that would beat transformers at their strengths as far as eye can see.

I remain highly skeptical. I doubt that transformers are the best architecture possible, but they set a high bar. And it sure seems like people who keep making the suggestion that "transformers aren't the future" aren't good enough to actually clear that bar.

airstrike · 2025-08-30T13:27:32 1756560452

That logic does not hold.

Being able to provide an immediate replacement is not a requirement to point out limitations in current technology.

ACCount37 · 2025-08-30T15:05:25 1756566325

What's the value of "pointing out limitations" if this completely fails to drive any improvements?

If any midwit can say "X is deeply flawed" but no one can put together an Y that would beat X, then clearly, pointing out the flaws was never the bottleneck at all.

jychang · 2025-09-01T10:15:15 1756721715

> What's the value of "pointing out limitations" if this completely fails to drive any improvements?

Ironically, the same could be said about Attention Is All You Need in 2017. It didn’t drive any improvements immediately- actual decent Transformer models took a few years to arrive after that.

airstrike · 2025-08-30T17:53:06 1756576386

I think you don't understand how primary research works. Pointing out flaws helps others think about those flaws.

It's not a linear process so I'm not sure the "bottleneck" analogy holds here.

We're not limited to only talking about "the bottleneck". I think the argument is more that we're very close to optimal results for the current approach/architecture, so getting superior outcomes from AI will actually require meaningfully different approaches.

ACCount37 · 2025-08-30T20:16:34 1756584994

Where's that "primary research" you're talking about? I certainly don't see it happening here right now.

My point is: saying "transformers are flawed" is dirt cheap. Coming up with anything less flawed isn't.

scragz · 2025-08-30T17:10:48 1756573848

what ever happened to Google's Titans?

radarsat1 · 2025-08-30T14:13:26 1756563206

To be fair it would be a lot easier to iterate on ideas if a single experiment didn't cost thousands of dollars and require such massive data. Things have really gotten to the point that it's just not easy for outsiders to contribute if you're not part of a big company or university, and even then you have to justify the expenditure (risk). Paradigm shifts are hard to come by when there is so much momentum in one direction and trying something different carries significant barriers.

yorwba · 2025-08-30T20:23:52 1756585432

Plenty of research involves small models trained on small amounts of data. You don't necessarily need to do an internet-scale training run to test a new model architecture, you can just compare it to other models of the same size trained on the same data. For example, small-model speedruns are a thing: https://github.com/KellerJordan/modded-nanogpt

HarHarVeryFunny · 2025-08-30T21:43:05 1756590185

The Transformer was only ever designed to be a better seq-2-seq architecture, so "all you need" implicitly means "all you need for seq-2-seq" (not all you need for AGI), and was anyways more backwards looking than forwards looking.

The preceding seq-2-seq architectures had been RNN (LSTM) based, then RNN + attention (Bahdanau et al "Jointly Learning to Align & Translate"), with the Transformer "attention is all you need" paper then meaning you can drop use of RNNs altogether and just use attention.

Of course NOT using RNNs was the key motivator behind the new Transformer architecture - not only did you not NEED an RNN, but they explicitly wanted to avoid it since the goal was to support parallel vs sequential processing for better performance on the available highly parallel hardware.

treyd · 2025-08-30T11:21:20 1756552880

Has there been research into some hierarchical attention model that has local attention at the scale of sentences and paragraphs that feeds embeddings up to longer range attention across documents?

mxkopy · 2025-08-30T12:27:26 1756556846

There’s the hierarchical reasoning model https://arxiv.org/abs/2506.21734 but it’s very new and largely untested

Though honestly I don’t think new neural network architectures are going to get us over this local maximum, I think the next steps forward involve something that’s

1. Non lossy

2. Readily interpretable

miven · 2025-08-30T13:02:15 1756558935

The ARC Prize Foundation ran extensive ablations on HRM for their slew of reasoning tasks and noted that the "hierarchical" part of their architecture is not much more impactful than a vanilla transformer of the same size with no extra hyperparameter tuning:

https://arcprize.org/blog/hrm-analysis#analyzing-hrms-contri...

ACCount37 · 2025-08-30T13:26:17 1756560377

By now, I seriously doubt any "readily interpretable" claims.

Nothing about human brain is "readily interpretable", and artificial neural networks - which, unlike brains, can be instrumented and experimented on easily - tend to resist interpretation nonetheless.

If there was an easy to reduce ML to "readily interpretable" representations, someone would have done so already. If there were architectures that perform similarly but are orders of magnitude more interpretable, they will be used, because interpretability is desirable. Instead, we get what we get.

mxkopy · 2025-08-31T04:03:56 1756613036

From what I’ve seen neurology is very readily interpretable but it’s hard to get data to interpret. For example the visual cortex V1-V5 areas are very well mapped out but other “deeper” structures are hard to get to and meaningfully measure.

ACCount37 · 2025-08-31T06:54:40 1756623280

They're interpretable in a similar way to how interpretable CNNs are. Not by a coincidence.

For CNNs, we know very well how the early layers work - edge detectors, curve detectors, etc. This understanding decays further into the model. In the brain, V1/V2 are similarly well studied, but it breaks down deeper into the visual cortex - and the sheer architectural complexity there sure doesn't help.

mxkopy · 2025-08-31T12:32:19 1756643539

Well, in terms of architectural complexity you have to wonder what something intelligent is going to look like, it’s probably not going to be very simple, but that doesn’t mean it can’t be readily interpreted. For the brain we can ascribe structure to evolutionary pressure, IMO there isn’t quite as powerful a principle at play with LLMs and transformer architectures and such. Like how does minimizing reconstruction loss help us understand the 50th, 60th layer of a neural network? It becomes very hard to interpret, compared to say the function of the amygdala or hippocampus in the context of evolutionary pressure.

sivm · 2025-08-15T08:50:46 1755247846

1. Stand at first light: face EAST, then NORTHEAST. Let the BERLIN CLOCK choose; read where the shad 2. Make a narrow breach of light; hold still; as the edge moves, letters awaken and the sealed doorw 3. Four passes: hours, hour, minutes, minute. Read on each sweep; the rising sun will order what see

He is an artist, not a mathematician. It’s a physical reveal for this layer of the copper onion.

sivm · 2025-08-15T08:58:46 1755248326

4. At dawn I stood east then northeast, counting by the clock; the rim's shadow wrote the hidden lin 5. First light, east to northeast. Copper grid in shadow. Sample on the beats. Write only what the l 6. Trust the clean edge, not the flicker; the mind finds patterns, but the edge alone reveals the me

Perhaps a 3D artist can model it and run some simulations with light.

sivm · 2025-08-15T22:12:14 1755295934

Edit- I tried using poetry above to identify my own cribs and then used software to search and best I got for K4 is:

AT FIRST LIGHT FACE EAST. ALIGN WITH THE RIM SHADOW. READ EDGE TO EDGE TO FIND THE FINAL CODE.

You most certainly need to be there and k4 is instructions. I’m not sending the guy $50 to check my answer though

sivm · 2025-08-15T09:20:42 1755249642

K5 is instructions something like: Head for the wall facing north; use the rim’s shadow as the clock. Berlin is the light.

So head north of it and use the shadows rim and 4/4/11/4 reveals and the Berlin clock sequence. Maybe on NOVEMBER NINE AT DAWN if you simulate it.

sivm · 2025-08-15T09:30:34 1755250234

K6 is something like NOON HERE CENTER ON THE NORTH RIM; THEN READ THAT.

I don’t know if it ends.

sivm · 2025-08-16T02:41:27 1755312087

Since nobody wants to play with me… if you have something beefier than my 386, you should have enough to finish. Remember it is always 5:55 somewhere- even in Berlin.

Partial answer: FACE EAST THEN NORTHEAST. LET THE BERLIN CLOCK CHOOSE; READ EDGE TO EDGE TO FIND THE FINAL CODE.

K4 可能由类 Vigenère 层加柏林钟的基5转置构成；我以已知 crib 为锚点，运行搜索，并利用 IC、χ² 与 n-gram 收敛到英文分布。

sivm · 2025-08-16T14:13:37 1755353617

K4 changes it’s methodology DRASTICALLY and you must use clues like a riddle from previous solutions. The Morse code is the program to run. Make a mask from the unneeded E’s but don’t discard them, they clarify. Ignore the flicker see the edg-e. The grid becomes a compass with some work and be sure to normalize the directions. Caesar might help dispel the mist. Decoys abound in partial/incomplete solutions. One wrong turn and work disappears. My 386 was an abomination to the old man and he set traps- paper and pencil ruled his world.

NORTHWEST, EAST, NORTHEAST. BERLIN CLOCK TICKS EAST. READ EDGE TO EDGE; SEE TIME, USE THE NORTHEAST SHADOW. THE HIDDEN PLACE IS REVEALED.

Sorry for the bother

sivm · 2025-08-18T03:03:04 1755486184

Here it is. It drove me crazy losing sleep

ATEQUINOXREADEDGETOEDEASTNORTHEASTFACETHENLETEDGEORDERBYTHERIMXBERLINCLOCKCHOOSEANDNAMETHEFINALCO

AT EQUINOX READ EDGE TO ED EAST, NORTHEAST FACE THEN LET EDGE ORDER BY THE RIM BERLIN CLOCK CHOOSE AND NAME THE FINAL CO

schoen · 2025-08-18T03:12:41 1755486761

Did you share this with an audience that's actively working on Kryptos?

sivm · 2025-08-18T03:34:58 1755488098

I tried to work with Reddit last night but was discouraged so I left. I emailed for verification and sent the $50.

sivm · 2025-08-18T04:30:07 1755491407

I’m going to forge ahead. I think this is another clue. I don’t know how he got so many in. Last night I found Tishri and Fenrir even. USS HILL sent a SOS RRR in a panic. It’s pretty crazy.

Edit- I think it might be K4. It is instructions for physical revealing k5.

Here is the intuition/clues (use your imagination) if you want to try too:

Edge to edge; Noon rim; Ignore flicker; Berlin clock beats

And remember the loadstone- that compass doesn’t point north

sivm · 2025-08-18T06:49:19 1755499759

Oh and the clues probably need a computer. But the final solution is just pencil and paper.

When you START searching for edges. Start with the YAR line. That’s what I did. But it goes on and on. Waypoint selectors…

The clues are in the K4 block of text, the actual answer is another layer using the rugged edges of all the panels. Oh and you need a timing program

If he responds I might write it up. But basically use compass order for edge stream in layer 1, use k0 (find the working sequence) for a digital interpretation (timing mask- diff from the E’s I used for clues) for layer 2, then start anchoring and rotating (use OOO for anchor after Ne block, one very small ordering move, and find your finisher cipher with clues.

sivm · 2025-08-18T19:16:45 1755544605

Stonewalled oh well. It was a fun weekend puzzle.

sivm · 2025-08-21T00:16:16 1755735376

Here is K5:

THIRTY EIGHT DEGREES FIFTY SEVEN MINUTES SIX POINT FIVE SECONDS NORTH SEVENTY SEVEN DEGREES EIGHT MINUTES FORTY FOUR SECONDS WEST

Take care.

sivm · 2025-07-21T20:27:12 1753129632

I operate under the assumption that open source projects are compromised by states. If you espouse unpopular ideas or are yourself a state don’t rely on it.

temp0826 · 2025-07-21T20:39:56 1753130396

Interesting, I'd more likely assume the same for closed source projects as there is less transparency into the supply chain

jmclnx · 2025-07-21T20:39:03 1753130343

Lets pretend what you are saying is true, which it is not. Who would you want to access your data ? The State or the "underworld". Many countries have laws on how to access your data. The underworld, you may wake up dead.

Granted there are countries that act like a Criminal Org., but if you live there you have more issues than your data.

With proprietary software, it is a much larger chance that backdoors exist than in Open Source. Many of us heard of 1 issue where it was claimed a project had a Gov sponsored BH in it. They did a long audit and found that was false.

Eventually Open Source backdoors will found in Open Systems. Proprietary you are SOL unless you do very expensive and very hard testing. Even then it is doubtful you will find a backdoor.

pessimizer · 2025-07-21T21:34:19 1753133659

It is true. Denying trivial truths with the purpose of not giving an inch does not add to one's argument, it weakens it.

Plenty of closed source products will happily backdoor their products on request, without a warrant, if they are confident they will never be found out. That's the point. Not that FOSS source is somehow inviolable to nation-states with virtually infinite resources, many of which sponsor or contribute to the finance of a huge percentage of the development of FOSS themselves.

It's easier to find backdoors in FOSS if you're looking, because you're allowed to look. But somebody has to be looking.

fsflover · 2025-07-21T21:11:45 1753132305

https://news.ycombinator.com/item?id=27897975

BobbyTables2 · 2025-07-22T02:26:13 1753151173

It’d be cheaper and quieter to compromise a few key employees in a private company…

sivm · 2025-07-03T16:37:44 1751560664

Bob Brier’s “The Great Courses” lecture series on ancient Egypt. Nubians were painted dark and Libyans were always shown with a feather in their headgear and blue eyes.

dismalaf · 2025-07-03T18:25:56 1751567156

So your source literally corroborates what I'm saying, not that Near East populations appeared Sub Saharan in complexion. Gotcha.

zozbot234 · 2025-07-03T19:14:43 1751570083

I never said that everyone in the Ancient Near East or the Mediterranean basin had a Sub-Saharan look, only that there were enough such people to be notable and that they were genuinely an integral part of those ancient societies, with quite high-status or even elite roles at times.

sivm · 2025-07-03T15:11:25 1751555485

Chinese mythology says they came from 崑崙 (Kunlun Mountain). The description of which sounds like Egypt coincidentally.

Translated something like: “To the south of the Western Sea, along the banks of the Flowing Sands, beyond the Red Water and before the Black Water, there lies a great mountain called the Kunlun Hill.”

sivm · 2025-07-03T15:06:06 1751555166

It didn’t. They clearly distinguished Nubians and Libyians from themselves in their art.

sivm · 2025-05-11T23:19:43 1747005583

70k? A more realistic pay is double that if you live anywhere other than the middle of nowhere for electricians, steamfitters, etc.

sivm · 2025-03-31T16:08:32 1743437312

We’re six black holes deep

sivm · 2025-02-03T14:39:17 1738593557

I used it once to research language learning and had my pro mode taken away pending review for abuse.

sivm · on Oct 15, 2024

In my experience in the US, nurses at primary care practices don’t really care and have no passion for their profession and the younger doctors are Anki flashcard veterans who could be replaced with a LLM and probably have the same outcome. Even DOs act like MDs these days- I guess it is easier to just write a prescription than advise traditional diets.

120/80 is an ideal blood pressure based on studies that show an association between elevated readings and an increased risk of heart attacks and strokes. More than half of humanity has a higher blood pressure than that. I believe most would have much lower readings if they stopped eating the trash food that capitalism has produced.

Ylpertnodi · on Oct 15, 2024

I jog, cycle, other light sports, work is walking a fair bit, and i eat well - zero trash, and even with meds I'm still way above 120/80. Heredity seems to play a part (my guess).