More

stncls · 2025-12-29T08:28:35 1766996915

> The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute JMP.

Makes sense, but in the assembly output just after, there is not a single JMP instruction. Instead, CALL <immediate> is replaced with putting the address in a 64-bit register, then CALL <register>, which makes even more sense. But why mention the JMP thing then? Is it a mistake or am I missing something? (I know some calls are replaced by JMP, but that's done regardless of -mcmodel=large)

dwattttt · 2025-12-29T12:51:54 1767012714

I would assume loose language, referring to a CALL as a JMP. However of the two reasons given to dislike the large code model, register pressure isn't relevant to that particular snippet.

It's performing a call, ABIs define registers that are not preserved over calls; writing the destination to one of those won't affect register pressure.

loeg · 2025-12-29T17:27:56 1767029276

I think the author is just noting that the construction is similar to an 8-byte JMP instruction. The text now reads:

> The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute 64bit ones; kind of like a JMP.

(We still need to use CALL in order to push a return address.)

stncls · 2025-08-02T13:58:05 1754143085

I don't know about "best", but I'm very happy with my last few Lenovo ThinkPads (X1 carbon, nano, some T-series). Before that, I had some Asus Zenbook, and everything worked as well. All had 4-15hr batteries, more than I expected in their respective eras. I've heard Dell XPS were good too.

There is no black magic. The only trick is checking the amazing Arch Linux wiki [1]. It will tell you everything you need to know, things like avoiding the recent Intel MIPI webcams (Linux support is coming, but count a couple of years for out-of-the-box).

Regarding Desktop Environments, it depends on your taste. I don't enjoy Gnome, but OSX refugees tend to like it. I've used XFCE, LXDE, LXQt, and recently KDE, and I have only good things to say about all. And tiling DE aficionados are spoiled for choice.

Plus, exchanging low-level code with Windows (wsl2) and OSX (largely posix-compatible) has never been easier. The only remaining issue being if you go down to assembly (Aarch64-vs-x86_64), which only crops up if you depend on proprietary applications.

To summarize, the picture is quite good nowadays, has been for a while, and is only improving. Of course, the one problem is closed-source apps. If you rely on one, document yourself on its quirks. Otherwise, Arch Linux wiki for hw support, and you're golden.

[1] https://wiki.archlinux.org

rufugee · 2025-08-02T14:30:12 1754145012

Thanks! re: desktop environment, I am sooooo excited to get back to i3/sway. Peak desktop experience for me. :)

stncls · 2025-07-06T04:36:25 1751776585

A good chunk of the day-to-day work of "operations research" consulting shops is scheduling.

There is also dedicated software like Timefold, formerly RedHat OptaPlanner [0].

[0] https://timefold.ai/blog/optaplanner-fork

ge0ffrey · 2025-07-06T15:08:33 1751814513

For Job Shop Scheduling like this, Timefold has two implementations:

- Basic: an open source Food Packaging quickstart on solver.timefold.ai

- Advanced: a Job Scheduling REST API on app.timefold.ai

stncls · 2025-05-09T15:10:33 1746803433

You're right, but it's very subtle and complicated.

In theory, the simplex method is not known to be polynomial-time, and it is likely that indeed it is not. Some variants of the simplex method have been proven to take exponential time in some worst cases (Klee-Minty cubes). What solvers implement could be said to be one such variant ("steepest-edge pricing"), but because solvers have tons of heuristics and engineering, and also because they work in floating-point arithmetic... it's difficult to tell for sure.

In practice, the main alternative is interior-point (aka. barrier) methods which, contrary to the simplex method, are polynomial-time in theory. They are usually (but not always) faster, and their advantage tends to increase for larger instances. The problem is that they are converging numerical algorithms, and with floating-point arithmetic they never quite 100% converge. By contrast, the simplex method is a combinatorial algorithm, and the numerical errors it faces should not accumulate. As a result, good solvers perform "crossover" after interior-point methods, to get a numerically clean optimal solution. Crossover is a combinatorial algorithm, like the simplex method. Unlike the simplex method though, crossover is polynomial-time in theory (strongly so, even). However, here, theory and practice diverge a bit, and crossover implementations are essentially simplified simplex methods. As a result, in my opinion, calling iterior-point + crossover polynomial-time would be a stretch.

Still, for large problems, we can expect iterior-point + crossover to be faster than the simplex method, by a factor 2x to 10x.

There is also first-order methods, which are getting much attention lately. However, in my experience, you should only use that if you are willing to tolerate huge constraint violations in the solution, and wildly suboptimal solutions. Their main use case is when other solvers need too much RAM to solve your instance.

underanalyzer · 2025-05-09T20:51:57 1746823917

Very interesting! Thanks for the reply. I wonder if they tried these other solvers and decided they were either too slow b/c their problems were too small or the answers were too inaccurate

stncls · 2025-05-09T14:48:19 1746802099

If this is business critical for you, you may want to switch to a faster solver. Glop is very nice, but it would be reasonable to expect a commercial solver (Gurobi, XPress, COpt) to be 60x faster [1]. By the same measure, the best open source solvers (CLP, HiGHS) are 2-3x faster than Glop.

Actually, the commercial solvers are so fast that I would not be surprised if they solved the IP problem as fast as Glop solves the LP. (Yes, the theory says it is impossible, but in practice it happens.) The cost of a commercial solver is 10k to 50k per license.

[1] ... this 60x number has very high variance depending on the type of problem, but it is not taken out of nowhere, it comes from the Mittelmann LP benchmarks https://plato.asu.edu/ftp/lpopt.html There are also benchmarks for other types of problems, including IP, see the whole list here: https://plato.asu.edu/bench.html

stncls · 2025-02-13T10:00:52 1739440852

> Instead, cryptography needs problems that are hard in the average case, like the RSA problem, the discrete logarithm for elliptic curves, and the shortest vector problem for lattices. We don’t technically know whether these are NP-complete

But we do know that the shortest vector problem is NP-hard (in Linf norm), and so is its decision version [1]. We don't have a reduction in the other direction, but we know that SVP is as-hard-as-NP-complete-or-harder.

This goes against the general argument of the article, but only weakly so, because I think lattice-based systems are less broadly deployed than elliptic curves or factoring (not a crypto person, though).

[1] H. Bennett, "The Complexity of the Shortest Vector Problem.", 2022 https://simons.berkeley.edu/sites/default/files/docs/21271/s...

less_less · 2025-02-13T16:20:43 1739463643

Yeah, SVP is NP-complete for certain parameters. But lattice cryptography uses other parameters where SVP might not be NP-complete. Also lattice crypto is usually based some other lattice problem like LWE, MLWE, MLWR, SIVP etc.

Lattice crypto is going to be broadly deployed because it hopefully can resist quantum attack, unlike ECDLP and factoring.

stncls · 2025-02-11T08:02:24 1739260944

Floating-point is hard, and standards seem like they cater to lawyers rather than devs. But a few things are slightly misleading in the post.

1. It correctly quotes the IEEE754-2008 standard:

> A conforming function shall return results correctly rounded for the applicable rounding direction for all operands in its domain

and even points out that the citation is from "Section 9. *Recommended* operations" (emphasis mine). But then it goes on to describes this as a "*requirement*" of the standard (it is not). This is not just a mistype, the post actually implies that implementations not following this recommendation are wrong:

> [...] none of the major mathematical libraries that are used throughout computing are actually rounding correctly as demanded in any version of IEEE 754 after the original 1985 release.

or:

> [...] ranging from benign disregard for the standard to placing the burden of correctness on the user who should know that the functions are wrong: “It is following the specification people believe it’s following.”

As far as I know, IEEE754 mandates correct rounding for elementary operations and sqrt(), and only for those.

2. All the mentions of 1 ULP in the beginning are a red herring. As the article itself mentions later, the standard never cares about 1 ULP. Some people do care about 1 ULP, just because it is something that can be achieved at a reasonable cost for transcendentals, so why not do it. But not the standard.

3. The author seems to believe that 0.5 ULP would be better than 1 ULP for numerical accuracy reasons:

> I was resounding told that the absolute error in the numbers are too small to be a problem. Frankly, I did not believe this.

I would personally also tell that to the author. But there is a much more important reason why correct rounding would be a tremendous advantage: reproducibility. There is always only one correct rounding. As a consequence, with correct rounding, different implementations return bit-for-bit identical results. The author even mentions falling victim to FP non-reproducibility in another part of the article.

4. This last point is excusable because the article is from 2020, but "solving" the fp32 incorrect-rounding problem by using fp64 is naive (not guaranteed to always work, although it will with high probability) and inefficient. It also does not say what to do for fp64. We can do correct rounding much faster now [1, 2]. So much faster that it is getting really close to non-correctly-rounded, so some libm may one day decide to switch to that.

[1] https://github.com/rutgers-apl/rlibm

[2] https://github.com/taschini/crlibm

SuchAnonMuchWow · 2025-02-11T09:23:47 1739265827

3. >> I was resounding told that the absolute error in the numbers are too small to be a problem. Frankly, I did not believe this.

> I would personally also tell that to the author. But there is a much more important reason why correct rounding would be a tremendous advantage: reproducibility.

This is also what the author want from his own experiences, but failed to realize/state explicitly: "People on different machines were seeing different patterns being generated which meant that it broke an aspect of our multiplayer game."

So yes, the reasons mentioned as a rationale for more accurate functions are in fact rationale for reproducibility across hardware and platforms. For example going from 1 ulp errors to 0.6 ulp errors would not help the author at all, but having reproducible behavior would (even with an increased worst case error).

Correctly rounded functions means the rounding error is the smallest possible, and as a consequence every implementation will always return exactly the same results: this is the main reason why people (and the author) advocates for correctly rounded implementations.

jefftk · 2025-02-11T12:13:28 1739276008

> "solving" the fp32 incorrect-rounding problem by using fp64 is naive (not guaranteed to always work, although it will with high probability)

The key thing is there are only 2*32 float32s so you can check all of them. It sounds to me like the author did this, and realized they needed some tweaks for correct answers with log.

stncls · 2025-02-06T12:34:35 1738845275

I'm ready to believe that pipewire is imperfect (although I have personally experienced no crash, and did not have to configure anything), but the sentence from the original post is:

> Therefore, I naturally omit the use and configuration of additional audio layers in the form of a Jack server, PulseAudio or the unfortunate PipeWire from RedHat.

I cannot understand this sentiment. I have used all three for years each. If I had to qualify one as "unfortunate", it would certainly not be PipeWire (nor would it be Jack for that matter).

> performance is not that great compared to pulseaudio and jack ..

Jack makes different trade-offs by default, so for some definition of performance, yes, one could see it as superior to PipeWire. But in my experience PipeWire is better than PulseAudio on all axes (at least: latency, CPU usage, resampling quality).

mouse_ · 2025-02-06T12:42:02 1738845722

Right? PipeWire is the best.

eptcyka · 2025-02-06T13:31:27 1738848687

Yeah, but it could still be better. I use it every day, and I wish it was better.

stncls · 2025-01-31T06:54:44 1738306484

> Complexity

> We present a polynomial-time algorithm achieving an approximation ratio below √2 for MVC, providing strong evidence that P = NP by efficiently solving a computationally hard problem with near-optimal solutions.

> This result contradicts the Unique Games Conjecture, suggesting that many optimization problems may admit better solutions, revolutionizing theoretical computer science.

Some context: The Unique Games Conjecture (UGC) is a very central unsolved problem in theoretical computer science (TCS), it's open since 2002. It conjectures that, for a series of standard optimization problems including minimum vertex cover, the best known approximation factor (how close you can guarantee to get to optimality with a poly time algorithm) is actually the best possible approximation factor. To disprove the conjecture, one needs a better approximation algorithm for one of those problems. Such an algorithm could be used to solve (to optimality, and in poly time) another set of problems, which are widely believed to be NP-hard. This would not prove P=NP (to be clear: the above quote is not claiming that), but it is true it would revolutionize TCS.

The thing is: this is TCS and theoretical complexity. You cannot disprove UGC with just code. You need a mathematical proof. This is not an indictment of the code which may be very good. But such an enormous claim would require pretty intense scrutiny before it would be accepted as true.

stncls · on July 9, 2024

No vulnerability name, no website, concise description, neutral tone, precise list of affected distros (RHEL + derivatives and some EOL Fedoras) and even mention of unaffected distros (current Fedoras), plain admission that no attempt was made to exploit. What a breath of fresh air!

(I am only joking of course. As a recovering academic, I understand that researchers need recognition, and I have no right to throw stones -- glass houses and all. Also, this one is really like regreSSHion's little sibling. Still, easily finding the information I needed made me happy.)

tptacek · on July 9, 2024

I don't think recognition for researchers is the big win for named vulnerabilities. In the places that matter, they can just describe their findings in a short sentence and get all the recognition that matters. The names are mostly for the benefit of users.

ericpauley · on July 9, 2024

Security researchers definitely do the naming gimmick for personal brand purposes. This may not be as obvious when it’s successful, but academic papers routinely name vulnerabilities when there is no real benefit to users.

tptacek · on July 9, 2024

The whole point of naming vulnerabilities is to establish a vernacular about them, so it's not surprising that academic papers name them. The literature about hardware microarchitectural attacks, for instance, would be fucking inscrutable (even more than it is now) without the names.

immibis · on July 9, 2024

I'd be happy to file all of them under Spectre/MDS, except for the ones that aren't Spectre/MDS, of course. They don't all need unique names. Most of them are all instances of the same pattern: some value is not present in a register when it's needed, and an Intel CPU design continues to execute speculatively with the previous contents of that register instead of inserting a pipeline bubble, leaking the previous contents of that register. Using an inter-core communication buffer, instead of a load data buffer like the last person, I don't think deserves a new name and logo. A new write-up, yes.

Wikipedia puts them all under one page: https://en.wikipedia.org/wiki/Transient_execution_CPU_vulner...

tptacek · on July 9, 2024

I don't even understand the impulse to lose the names. Names aren't achievement awards. We already have Best Paper awards at the Big 4 and the Pwnies (for however seriously you take that). The names don't cost anybody anything, and they're occasionally helpful.

Name them all.

You see the same weird discussions about CVEs, and people wanting to squash CVEs down (or not issue them at all) because the research work is deemed insufficient to merit the recognition. As if recognition for work was ever even ostensibly what the CVE program was about.

AndyMcConachie · on July 9, 2024

The author of the mail is Solar Designer, a bit of a legend AFAIC. He has no need to pump up his brand and he really really knows what he's doing.

formerly_proven · on July 9, 2024

Yeah. He created openwall and the oss-security list.

dimask · on July 10, 2024

At least they do not name them after themselves.