I think the implication is that you should own multiple client devices capable of SSHing into things, each with their own SSH keypair; and every SSH host you interact with should have multiple of your devices’ keypairs registered to it.
Tuna-Fish said that instead of backing up the keys from your devices, you should create a specific backup key that is only ever used in case you lose access to all your devices.
This is indeed best practice because it allows you to alert based on key: if you receive a login on a machine with your backup key, but you haven't lost your devices, then you know your backup was compromised. If you take backups of your regular key then it would be much more difficult to notice a problem.
I've done it in the past (~2015). Honestly if Google locked me out of all of those other purchases it'd be great grounds to sue them. If everyone started doing this it would prevent them from doing this in the first place and may be additional fodder for (hopefully) continued anti-trust losses in court. If your life is tied to Google in that way then it's a risk no matter what you do and you should probably think about how to reduce that risk. I don't have anything other than purchases tied to my Google accounts anymore.
It’d be difficult to use in any automated process, as the judgement for how good one of these renditions is, is very qualitative.
You could try to rasterize the SVG and then use an image2text model to describe it, but I suspect it would just “see through” any flaws in the depiction and describe it as “a pelican on a bicycle” anyway.
I would argue that an LLM is a perfectly sensible tool for structure-preserving machine translation from another language to English. (Where by "another language", you could also also substitute "very poor/non-fluent English." Though IMHO that's a bit silly, even though it's possible; there's little sense in writing in a language you only half know, when you'd get a less-lossy result from just writing in your native tongue, and then having it translate from that.)
Google Translate et al were never good enough at this task to actually allow people to use the results for anything professional. Previous tools were limited to getting a rough gloss of what words in another language mean.
But LLMs can be used in this way, and are being used in this way; and this is increasingly allowing non-English-fluent academics to publish papers in English-language journals (thus engaging with the English-language academic community), where previously those academics they may have felt "stuck" publishing in what few journals exist for their discipline in their own language.
Would you call the use of LLMs for translation "shoddy" or "irresponsible"? To me, it'd be no more and no less "shoddy" or "irresponsible" than it would be to hire a freelance human translator to translate the paper for you. (In fact, the human translator might be a worse idea, as LLMs are more likely to understand how to translate the specific academic jargon of your discipline than a randomly-selected human translator would be.)
Autotranslating technical texts is very hard. After the translation, you muct check that all the technical words were translated correctly, instead of a fancy synonym that does not make sense.
(A friend has an old book translated a long time ago (by a human) from Russian to Spanish. Instead of "complex numbers", the book calls them "complicated numbers". :) )
I remember one time when I had written a bunch of user facing text for an imaging app and was reviewing our French translation. I don't speak French but I was pretty sure "plane" (as in geometry) shouldn't be translated as "avion". And this was human translated!
You'd be surprised how shoddy human translations can be, and it's not necessarily because of the translators themselves.
Typically what happens is that translators are given an Excel sheet with the original text in a column, and the translated text must be put into the next column. Because there's no context, it's not necessarily clear to the translator whether the translation for plane should be avion (airplane) or plan (geometric plane). The translator might not ever see the actual software with their translated text.
The convenient thing in this case (verification of translation of academic papers from the speaker's native language to English) is that the authors of the paper likely already 1. can read English to some degree, and 2. are highly likely to be familiar specifically with the jargon terms of their field in both their own language and in English.
This is because, even in countries with a different primary spoken language, many academic subjects, especially at a graduate level (masters/PhD programs — i.e. when publishing starts to matter), are still taught at universities at least partly in English. The best textbooks are usually written in English (with acceptably-faithful translations of these texts being rarer than you'd think); all the seminal papers one might reference are likely to be in English; etc. For many programs, the ability to read English to some degree is a requirement for attendance.
And yet these same programs are also likely to provide lectures (and TA assistance) in the country's own native language, with the native-language versions of the jargon terms used. And any collaborative work is likely to also occur in the native language. So attendees of such programs end up exposed to both the native-language and English-language terms within their field.
This means that academics in these places often have very little trouble in verifying the fidelity of translation of the jargon in their papers. It's usually all the other stuff in the translation that they aren't sure is correct. But this can be cheaply verified by handing the paper to any fluently-multilingual non-academic and asking them to check the translation, with the instruction to just ignore the jargon terms because they were already verified.
To that point I think it's lovely how LLMs democratize science. At ICLR a few years ago I spoke with a few Korean researchers that were delighted that their relative inability to write in English was no being held against them during the review process. I think until then I underestimated how pivotal this technology was in lowering the barrier to entry for the non-English speaking scientific community.
If they can write a whole draft in their first language, they can easily read the translated English version and correct it. The errors described by gp/op were generated when authors directly required LLM to generate a full paragraph of text. Look at my terrible English; I really have the experience of the full process from draft to English version before :)
We still do not have a standardized way to represent Machine Learning concepts. For example in vision model, I see lots of papers confused about the "skip connections" and "residual connection" and when they concatenate channels they call them "residual connection" while it shows that they haven't understood why we call them "residual" in the first place. In my humble opinion, each conference, and better be a confederation of conferences, work together to provide a glossary, a technical guideline, and also a special machine translation tool, to correct a non-clear-with-lots-of-grammatical-error-English like mine!
I'm surprised by these results. I agree that LLMs are a great tool for offsetting the English-speaking world's advantage. I would have expected non-Anglo-American universities to rank at the top of the list. One of the most valuable features of LLMs from the beginning has been their ability to improve written language.
Why is their use more intense in English-speaking universities?
Good point. There may be a place for LLMs for science writing translation (hopefully not adding nor subtracting anything) when you're not fluent in the language of a venue.
You need a way to validate the correctness of the translation, and to be able to stand behind whatever the translation says. And the translation should be disclosed on the paper.
> Coming from a software background, this seems bizarre, as if C++ compilers rejected valid programs unless they stuck to easy constructs with obvious assembly implementations.
To my understanding, isn’t it more like there being a perfectly good IR instruction coding for a feature, but with no extant ISA codegen targets that recognize that instruction? I.e. you get stuck at the step where you’re lowering the code for a specific FPGA impl.
And, as with compilers, one could get around this by defining a new abstract codegen target implemented only in the form of a software simulator, and adding support for the feature to that. Though it would be mightily unsatisfying to ultimately be constrained to run your FPGA bitstream on a CPU :)
The non-synthesizable features of Verilog not only work in current simulators, they were expressly developed for that purpose. Verilog has those features to describe conditions that might exist in a semiconductor as manufactured, but aren't part of any design, so that they can be more accurately simulated. For example, a pin state can be unknown, or two pins can be connected with a delay line. These allow a real-life semiconductor to be characterized well enough to insert into a simulation of the electronics circuit as a whole.
It's more akin to directives than instructions. Debug instructions can also serve a similar purpose, although they actually run on the hardware, whereas compiler directives and non-synthesizable verilog instructions are never executed.
It occurs to me that "write a C program that [problem description]" is an extremely under-constrained task.
People are highly aware that C++ programmers are always using some particular subset of C++; but it's not as obvious that any actual C programmer is actually going to use a particular dialect on top of C.
Since the C standard library is so anemic for algorithms and data structures, any given "C programmer" is going to have a hash map of choice, a b-tree of choice, a streams abstraction of choice, an async abstraction of choice, etc.
And, in any project they create, they're going to depend on (or vendor in) those low-level libraries.
Meanwhile, any big framework-ish library (GTK, OpenMP, OpenSSL) is also going to have its own set of built-in data structures that you have to use to interact with it (because it needs to take and return such data-structures in its API, and it has to define them in order to do that.) Which often makes it feel more correct, in such C projects, to use that framework's abstractions throughout your own code, rather than also bringing your own favorite ones and constantly hitting the impedance wall of FFI-ing between them.
It's actually shocking that, in both FOSS and hiring, we expect "experienced C programmers" who've worked for 99% of their careers with a dialect of C consisting of abstractions from libraries E+F+G, to also be able to jump onto C codebases that instead use abstractions from libraries W+X+Y+Z (that may depend on entirely different usage patterns for their safety guarantees!), look around a bit, and immediately be productively contributing.
It's no wonder an AI can't do that. Humans can barely do it!
My guess is that the performance of an AI coding agent on a greenfield C project would massively improve if you initially prompt it (or instruct it in an AGENTS.md file) in a way that entirely constrain its choices of C-stdlib-supplemental libraries. Either by explicitly listing them; or by just saying e.g. "Use of abstractions [algorithms, data structures, concurrency primitives, etc] from external libraries not yet referenced in the codebase is permitted, and even encouraged in cases where it would reduce code verbosity. Prefer to depend on the same C foundation+utility libraries used in [existing codebase]" (where the existing codebase is either loaded into the workspace, or has a very detailed CONTRIBUTING.md you can point the agent at.)
I follow the author's line of reasoning, but I think that following it to its logical conclusion would lead not to an `execute_code` primitive, but rather to an assumption that the model's stdout is appending to a (Jupyter, Livebook, etc) notebook file, where any code cell in the notebook gets executed (and its output rendered back into the inference context) at the moment the code cell is closed / becomes syntactically valid.
I say this, because the notebook itself then works as a timeline of both the conversation, and the code execution. Any code cell can be (edited and) re-run by the human, and any cells "downstream" of the cell will be recalculated... up to the point of the first cell (code or text) whose assumptions become invalidated by the change — at which point you get a context-history branch, and the inference resumes from that branch point against the modified context.
Sounds like an approach that would also work for ML model weights files — just another kind of multidimensional array with metadata.
I wonder what exactly the big multi-model AI companies are doing to optimize model cold-start latency, and how much it just looks like Zarr on top of on-prem object storage.
People have literally used Zarr for this - at one point Gemini used Zarr for checkpointing model weights. Not sure what the current fashion in that space is though.
It's definitely one of many fields that see convergent evolution towards something that just looks like Zarr. In fact you can use VirtualiZarr to parse HuggingFace's "SafeTensors" format [0].
The server would be telling the client the rate-limiting values active/effective for to it. As such, the client doesn't actually need to know what "its partition" is. As far as the client is concerned, "its partition" is the whole of the rate-limiting domain.
The partitioning strategy, and partition chosen using it, would never — should never — be relevant to any automated logic inside the client. (The only way in which it could be would be if you were trying to make a client that aims to defeat the server's rate-limiting logic by using multiple accounts or IP addresses to jump between partitions, and that's... not okay.)
The point of sending the partitioning info to the client, is that it enables a human developing a client, or operating a tool that embeds a client, to debug why rate-limiting is happening when by their understanding it shouldn't be — especially when they have multiple clients across multiple threads / machines each making multiple concurrent requests to the API. These HTTP-429-response heisenbugs get much easier to reason about when the server is sending the client enough information for the developer to be able to see which of the requests they sent got rate-limiting-bucketed together, and which didn't.
Sure, for e.g. E2E email, the expectation is that all the computation occurs on the client, and the server is a dumb store of opaque encrypted stuff.
In a traditional E2E chat app, on the other hand, you've still got a backend service acting as a dumb pipe, that shouldn't have the keys to decrypt traffic flowing through it; but you've also got multiple clients — not just your own that share your keybag, but the clients of other users you're communicating with. "E2E" in the context of a chat app, means "messages are encrypted within your client; messages can then only be decrypted within the destination client(s) [i.e. the client(s) of the user(s) in the message thread with you.]"
"E2E AI chat" would be E2E chat, with an LLM. The LLM is the other user in the chat thread with you; and this other user has its own distinct set of devices that it must interact through (because those devices are within the security boundary of its inference infrastructure.) So messages must decrypt on the LLM's side for it to read and reply to, just as they must decrypt on another human user's side for them to read and reply to. The LLM isn't the backend here; the chat servers acting as a "pipe" are the backend, while the LLM is on the same level of the network diagram as the user is.
Let's consider the trivial version of an "E2E AI chat" design, where you physically control and possess the inference infrastructure. The LLM infra is e.g. your home workstation with some beefy GPUs in it. In this version, you can just run Signal on the same workstation, and connect it to the locally-running inference model as an MCP server. Then all your other devices gain the ability to "E2E AI chat" with the agent that resides in your workstation.
The design question, being addressed by Moxie here, is what happens in the non-trivial case, when you aren't in physical possession of any inference infrastructure.
Which is obviously the applicable case to solve for most people, 100% of the time, since most people don't own and won't ever own fancy GPU workstations.
But, perhaps more interesting for us tech-heads that do consider buying such hardware, and would like to solve problems by designing architectures that make use of it... the same design question still pertains, at least somewhat, even when you do "own" the infra; just as long as you aren't in 100% continuous physical possession of it.
You would still want attestation (and whatever else is required here) even for an agent installed on your home workstation, so long as you're planning to ever communicate with it through your little chat gateway when you're not at home. (Which, I mean... why else would you bother with setting up an "E2E AI chat" in the first place, if not to be able to do that?)
Consider: your local flavor of state spooks could wait for you to leave your house; slip in and install a rootkit that directly reads from the inference backend's memory; and then disappear into the night before you get home. And, no matter how highly you presume your abilities to detect that your home has been intruded into / your computer has been modified / etc once you have physical access to those things again... you'd still want to be able to detect a compromise of your machine even before you get home, so that you'll know to avoid speaking to your agent (and thereby the nearby wiretap van) until then.
reply