More

volodia · 2025-11-14T22:18:47 1763158727

There is also this one that was released in October: https://github.com/kuleshov/char-mdlm

volodia · 2025-03-14T19:54:30 1741982070

the LLaDA paper is a scaled-up version of this paper; they cite it as an anonymous ICLR submission

m00x · 2025-03-14T22:24:10 1741991050

I'm not sure if this is what you mean, but LLaDA isn't block text diffusion. This is a mix between an autoregressive model and a diffusion model, which is brand new.

ashirviskas · 2025-03-15T15:33:27 1742052807

It is a soft-block text diffusion. They have one super-block of fixed size loaded and then allow the model to only unmask tokens by going through the soft-blocks. As the source code is available and I was able to change it into an actual block diffusion, but as the model was trained only on super-blocks, it was always trying to generate eos tokens at each block end before I extended it. I've tried a few workarounds that half worked, but I guess a very small scale finetune is needed to resolve it fully.

impossiblefork · 2025-03-14T22:10:51 1741990251

volodia · 2025-02-26T23:55:14 1740614114

That's a good point. In this context, we've been using "commodity GPUs" to refer to standard Nvidia hardware, in contrast to specialized chips like Groq and Cerebras. While these chips also achieve fast speeds, they are not nearly as ubiquitous as Nvidia GPUs. We think that matching their performance on standard Nvidia hardware can make AI much more affordable. We also support any GPUs, not just H100's.

We're going to be releasing a tech report soon, stay tuned!

volodia · 2025-02-26T22:38:52 1740609532

Not today, but we will be following up with a technical report over the next week or so. In the meantime, you can take a look at some of the research papers that inspired our work: - https://arxiv.org/abs/2310.16834 - https://arxiv.org/abs/2406.07524

volodia · 2025-02-26T20:22:14 1740601334

This is Volodymyr, co-founder at Inception---let us know if you have any questions about diffusion, language modeling, and our new Mercury models!

aDyslecticCrow · 2025-03-07T21:35:44 1741383344

This is simply brilliant.

The moment I heard the synopsis of the technique, I thought of one thing: Style transfer.

This model style should be really nice for translation and style transfer tasks. Takes an existing section text, noises it, and reverses it with guidance like an image diffusion model; A "movement" in latent with a controllable amount of modifications.

The diffusion process enables a wide range of "control" approaches not possible with current transformer models. Perhaps summarizing text can be done differently as well, taking an input and diffusing it into a shorter and shorter section.

I've not been this hyped about a new method since GPT3 itself.

vessenes · 2025-03-01T06:41:32 1740811292

Volodymyr, congrats. This is crazy fast. If not super great at long context coding tasks. I tagged a few problem responses.

I'm curious about something that has analogues in image diffusion models -- you can see diffusion models, depending on how they are working through their latent space, sometimes try out and then move on from a feature in an image as it fits less with what's around it.

Are there analogues for Mercury? Does it try with a token or set of tokens, and as parts of the response fill in move on from them? Similarly, this architecture seems like it would have real problems inserting a needed token in the middle of a bunch of relatively high confidence generated tokens.

Can you give some insight / thoughts from the frontlines on these?

tsadoq · 2025-02-26T22:42:13 1740609733

It looks super cool, any plan on open sourcing something? Btw, looking for an AI solution/sale engineer :P

volodia · 2025-02-26T22:44:55 1740609895

Good question! We are not open sourcing the models at launch time, but we have a roadmap of future releases in which we hope to make some of our models accessible to the research community.

Reubend · 2025-02-26T23:05:20 1740611120

Super cool, and I'd love to play around with this if they release an open source version.

Without a full paper, it's a bit hard to understand the full details. Does this essentially replace nucleus sampling with diffusion, or does it change the "core" transformer architecture in a major way?

volodia · 2025-02-26T23:55:57 1740614157

Yes, we plan to be releasing a tech report soon. We are not open sourcing the models at launch time, but we have a roadmap of future releases in which we hope to make some of our models accessible to the research community.

g-mork · 2025-02-27T01:24:58 1740619498

Probably it's not relevant to you commercially at the moment (or ever?), but would love some intuition on how your models perform on really low end hardware. Does this technique translate into improved CPU-only performance? Also curious about density, does the technique require more/fewer/roughly same parameters as a traditional LLM for the same output quality?

volodia · 2025-02-27T02:15:59 1740622559

Great question! The model can more efficiently leverage existing GPU hardware---it performs more computation per unit of memory transferred; this means that on older hardware one should be able to get similar inference speeds as one would get on recent hardware with a classical LLM. This is actually interesting commercially, since it opens new ways of reducing AI inference costs.

Rin0 · 2025-02-27T17:05:42 1740675942

May I ask if the training cost of the Mercury Code is higher compared to existing LLMs of similar capabilities?

olddustytrail · 2025-02-26T22:24:32 1740608672

How does producing tokens in parallel not just result in completely incoherent output?

imtringued · 2025-02-26T22:59:18 1740610758

Assuming the model tracks convergence in one way or another, it would simply continue performing iterations until it has reached an error below an epsilon value.

This means that in the worst case the number of iterations is the same as a classic autoregressive transformer.

So they are mostly taking advantage of the fact that the average response is in reality not fully sequential, so the model is discovering the exploitable parallelism on its own.

This is not too dissimilar to a branch and bound algorithm that has a worse theoretical runtime than a simple brute force search, but in practice is solving the integer linear programming problem in almost polynomial time, because not everyone is encoding the hardest instances of problems in NP as integer linear programs.

volodia · 2025-02-26T22:44:26 1740609866

The short answer is that we do more than one parallel pass over multiple tokens: we iteratively refine them over a few passes to fix incoherences. This can be seen as a generalization of diffusion algorithms that underlie systems like Midjourney or Sora.

aw123 · 2025-02-28T23:30:59 1740785459

so if I understand correctly, you remask some tokens that were previously unmasked?

volodia · on March 19, 2023

It won't run as fast on your CPU at it will run on a GPU. Also, it might clog most of your RAM; it's better to offload to a cheap GPU.

volodia · on March 18, 2023

There are also equally easy to use open source systems, check out this one for example: https://github.com/kuleshov/minillm

volodia · on March 18, 2023

Try this: it installs with a simple python command if you have an NVIDIA GPU: https://github.com/kuleshov/minillm

volodia · on Aug 1, 2019

Afresh is a Series A startup focused on automating the food supply chain using AI with the ultimate goal of eliminating food waste. In the US, about 40% of all food waste occurs in supermarkets and downstream, largely due to inefficient manual ordering processes. This waste leads to >$80B in economic losses as well as 1.5 billion tons of greenhouse gas emissions, which is comparable to the emissions of Japan.

Afresh is commercializing a technology developed as part of a Stanford research project that automates the pen-and-paper processes used by supermarket operators. This technology cuts retail food waste by >50% and dramatically increases the stores' profit margins.

We are founded by a team of Computer Science PhDs, MBAs, designers, and engineers from Stanford, Berkeley, CMU. We're backed by former Google CEO Eric Schmidt's firm (Innovation Endeavors) and the first investors in Instagram, Stitchfix, SoFi, and Heroku (Steve Anderson of Baseline Ventures).

We're growing fast: we're in a partnership with 4 large regional grocers representing 500+ stores and >$10B in revenue. We're also looking for smart, enthusiastic, dependable people interested in applying cutting-edge technology to problems with significant societal impact.

Our open roles are:

- Lead UI/UX Designer - Lead Product Manager - Machine Learning Engineer - Senior Backend Engineer - Full job descriptions available at: https://jobs.lever.co/afreshtechnologies?

Website: http://afresh.ai

Feel free to reach out directly to volodymyr@afreshtechnologies.com (I'm the CTO)

volodia · on May 1, 2019

Afresh | San Francisco, CA | Full-time | Onsite

Afresh is a Series A startup focused on automating the food supply chain using AI with the ultimate goal of eliminating food waste. In the US, about 40% of all food waste occurs in supermarkets and downstream, largely due to inefficient manual ordering processes. This waste leads to >$80B in economic losses as well as 1.5 billion tons of greenhouse gas emissions, which is comparable to the emissions of Japan.

Afresh is commercializing a technology developed as part of a Stanford research project that automates the pen-and-paper processes used by supermarket operators. This technology cuts retail food waste by >50% and dramatically increases the stores' profit margins.

We are founded by a team of Computer Science PhDs, MBAs, designers, and engineers from Stanford, Berkeley, CMU. We're backed by former Google CEO Eric Schmidt's firm (Innovation Endeavors) and the first investors in Instagram, Stitchfix, SoFi, and Heroku (Steve Anderson of Baseline Ventures).

We're growing fast: we're in a partnership with 4 large regional grocers representing 500+ stores and >$10B in revenue. We're also looking for smart, enthusiastic, dependable people interested in applying cutting-edge technology to problems with significant societal impact.

Our open roles are: * Machine Learning Engineer * Backend Engineer * Site Reliability / DevOps * Mobile Developer * Full-Stack Web Developer Full job descriptions available at: https://jobs.lever.co/afreshtechnologies?

Feel free to reach out directly to volodymyr@afreshtechnologies.com (I'm the CTO)