> The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute JMP.
Makes sense, but in the assembly output just after, there is not a single JMP instruction. Instead, CALL <immediate> is replaced with putting the address in a 64-bit register, then CALL <register>, which makes even more sense. But why mention the JMP thing then? Is it a mistake or am I missing something? (I know some calls are replaced by JMP, but that's done regardless of -mcmodel=large)
I would assume loose language, referring to a CALL as a JMP. However of the two reasons given to dislike the large code model, register pressure isn't relevant to that particular snippet.
It's performing a call, ABIs define registers that are not preserved over calls; writing the destination to one of those won't affect register pressure.
I don't know about "best", but I'm very happy with my last few Lenovo ThinkPads (X1 carbon, nano, some T-series). Before that, I had some Asus Zenbook, and everything worked as well. All had 4-15hr batteries, more than I expected in their respective eras. I've heard Dell XPS were good too.
There is no black magic. The only trick is checking the amazing Arch Linux wiki [1]. It will tell you everything you need to know, things like avoiding the recent Intel MIPI webcams (Linux support is coming, but count a couple of years for out-of-the-box).
Regarding Desktop Environments, it depends on your taste. I don't enjoy Gnome, but OSX refugees tend to like it. I've used XFCE, LXDE, LXQt, and recently KDE, and I have only good things to say about all. And tiling DE aficionados are spoiled for choice.
Plus, exchanging low-level code with Windows (wsl2) and OSX (largely posix-compatible) has never been easier. The only remaining issue being if you go down to assembly (Aarch64-vs-x86_64), which only crops up if you depend on proprietary applications.
To summarize, the picture is quite good nowadays, has been for a while, and is only improving. Of course, the one problem is closed-source apps. If you rely on one, document yourself on its quirks. Otherwise, Arch Linux wiki for hw support, and you're golden.
You're right, but it's very subtle and complicated.
In theory, the simplex method is not known to be polynomial-time, and it is likely that indeed it is not. Some variants of the simplex method have been proven to take exponential time in some worst cases (Klee-Minty cubes). What solvers implement could be said to be one such variant ("steepest-edge pricing"), but because solvers have tons of heuristics and engineering, and also because they work in floating-point arithmetic... it's difficult to tell for sure.
In practice, the main alternative is interior-point (aka. barrier) methods which, contrary to the simplex method, are polynomial-time in theory. They are usually (but not always) faster, and their advantage tends to increase for larger instances. The problem is that they are converging numerical algorithms, and with floating-point arithmetic they never quite 100% converge. By contrast, the simplex method is a combinatorial algorithm, and the numerical errors it faces should not accumulate. As a result, good solvers perform "crossover" after interior-point methods, to get a numerically clean optimal solution. Crossover is a combinatorial algorithm, like the simplex method. Unlike the simplex method though, crossover is polynomial-time in theory (strongly so, even). However, here, theory and practice diverge a bit, and crossover implementations are essentially simplified simplex methods. As a result, in my opinion, calling iterior-point + crossover polynomial-time would be a stretch.
Still, for large problems, we can expect iterior-point + crossover to be faster than the simplex method, by a factor 2x to 10x.
There is also first-order methods, which are getting much attention lately. However, in my experience, you should only use that if you are willing to tolerate huge constraint violations in the solution, and wildly suboptimal solutions. Their main use case is when other solvers need too much RAM to solve your instance.
Very interesting! Thanks for the reply. I wonder if they tried these other solvers and decided they were either too slow b/c their problems were too small or the answers were too inaccurate
If this is business critical for you, you may want to switch to a faster solver. Glop is very nice, but it would be reasonable to expect a commercial solver (Gurobi, XPress, COpt) to be 60x faster [1]. By the same measure, the best open source solvers (CLP, HiGHS) are 2-3x faster than Glop.
Actually, the commercial solvers are so fast that I would not be surprised if they solved the IP problem as fast as Glop solves the LP. (Yes, the theory says it is impossible, but in practice it happens.) The cost of a commercial solver is 10k to 50k per license.
[1] ... this 60x number has very high variance depending on the type of problem, but it is not taken out of nowhere, it comes from the Mittelmann LP benchmarks https://plato.asu.edu/ftp/lpopt.html There are also benchmarks for other types of problems, including IP, see the whole list here: https://plato.asu.edu/bench.html
> Instead, cryptography needs problems that are hard in the average case, like the RSA problem, the discrete logarithm for elliptic curves, and the shortest vector problem for lattices. We don’t technically know whether these are NP-complete
But we do know that the shortest vector problem is NP-hard (in Linf norm), and so is its decision version [1]. We don't have a reduction in the other direction, but we know that SVP is as-hard-as-NP-complete-or-harder.
This goes against the general argument of the article, but only weakly so, because I think lattice-based systems are less broadly deployed than elliptic curves or factoring (not a crypto person, though).
Yeah, SVP is NP-complete for certain parameters. But lattice cryptography uses other parameters where SVP might not be NP-complete. Also lattice crypto is usually based some other lattice problem like LWE, MLWE, MLWR, SIVP etc.
Lattice crypto is going to be broadly deployed because it hopefully can resist quantum attack, unlike ECDLP and factoring.
Floating-point is hard, and standards seem like they cater to lawyers rather than devs. But a few things are slightly misleading in the post.
1. It correctly quotes the IEEE754-2008 standard:
> A conforming function shall return results correctly rounded for the applicable rounding direction for all operands in its domain
and even points out that the citation is from "Section 9. *Recommended* operations" (emphasis mine). But then it goes on to describes this as a "*requirement*" of the standard (it is not). This is not just a mistype, the post actually implies that implementations not following this recommendation are wrong:
> [...] none of the major mathematical libraries that are used throughout computing are actually rounding correctly as demanded in any version of IEEE 754 after the original 1985 release.
or:
> [...] ranging from benign disregard for the standard to placing the burden of correctness on the user who should know that the functions are wrong: “It is following the specification people believe it’s following.”
As far as I know, IEEE754 mandates correct rounding for elementary operations and sqrt(), and only for those.
2. All the mentions of 1 ULP in the beginning are a red herring. As the article itself mentions later, the standard never cares about 1 ULP. Some people do care about 1 ULP, just because it is something that can be achieved at a reasonable cost for transcendentals, so why not do it. But not the standard.
3. The author seems to believe that 0.5 ULP would be better than 1 ULP for numerical accuracy reasons:
> I was resounding told that the absolute error in the numbers are too small to be a problem. Frankly, I did not believe this.
I would personally also tell that to the author. But there is a much more important reason why correct rounding would be a tremendous advantage: reproducibility. There is always only one correct rounding. As a consequence, with correct rounding, different implementations return bit-for-bit identical results. The author even mentions falling victim to FP non-reproducibility in another part of the article.
4. This last point is excusable because the article is from 2020, but "solving" the fp32 incorrect-rounding problem by using fp64 is naive (not guaranteed to always work, although it will with high probability) and inefficient. It also does not say what to do for fp64. We can do correct rounding much faster now [1, 2]. So much faster that it is getting really close to non-correctly-rounded, so some libm may one day decide to switch to that.
3. >> I was resounding told that the absolute error in the numbers are too small to be a problem. Frankly, I did not believe this.
> I would personally also tell that to the author. But there is a much more important reason why correct rounding would be a tremendous advantage: reproducibility.
This is also what the author want from his own experiences, but failed to realize/state explicitly: "People on different machines were seeing different patterns being generated which meant that it broke an aspect of our multiplayer game."
So yes, the reasons mentioned as a rationale for more accurate functions are in fact rationale for reproducibility across hardware and platforms. For example going from 1 ulp errors to 0.6 ulp errors would not help the author at all, but having reproducible behavior would (even with an increased worst case error).
Correctly rounded functions means the rounding error is the smallest possible, and as a consequence every implementation will always return exactly the same results: this is the main reason why people (and the author) advocates for correctly rounded implementations.
> "solving" the fp32 incorrect-rounding problem by using fp64 is naive (not guaranteed to always work, although it will with high probability)
The key thing is there are only 2*32 float32s so you can check all of them. It sounds to me like the author did this, and realized they needed some tweaks for correct answers with log.
I'm ready to believe that pipewire is imperfect (although I have personally experienced no crash, and did not have to configure anything), but the sentence from the original post is:
> Therefore, I naturally omit the use and configuration of additional audio layers in the form of a Jack server, PulseAudio or the unfortunate PipeWire from RedHat.
I cannot understand this sentiment. I have used all three for years each. If I had to qualify one as "unfortunate", it would certainly not be PipeWire (nor would it be Jack for that matter).
> performance is not that great compared to pulseaudio and jack ..
Jack makes different trade-offs by default, so for some definition of performance, yes, one could see it as superior to PipeWire. But in my experience PipeWire is better than PulseAudio on all axes (at least: latency, CPU usage, resampling quality).
> We present a polynomial-time algorithm achieving an approximation ratio below √2 for MVC, providing strong evidence that P = NP by efficiently solving a computationally hard problem with near-optimal solutions.
> This result contradicts the Unique Games Conjecture, suggesting that many optimization problems may admit better solutions, revolutionizing theoretical computer science.
Some context: The Unique Games Conjecture (UGC) is a very central unsolved problem in theoretical computer science (TCS), it's open since 2002. It conjectures that, for a series of standard optimization problems including minimum vertex cover, the best known approximation factor (how close you can guarantee to get to optimality with a poly time algorithm) is actually the best possible approximation factor. To disprove the conjecture, one needs a better approximation algorithm for one of those problems. Such an algorithm could be used to solve (to optimality, and in poly time) another set of problems, which are widely believed to be NP-hard. This would not prove P=NP (to be clear: the above quote is not claiming that), but it is true it would revolutionize TCS.
The thing is: this is TCS and theoretical complexity. You cannot disprove UGC with just code. You need a mathematical proof. This is not an indictment of the code which may be very good. But such an enormous claim would require pretty intense scrutiny before it would be accepted as true.
No vulnerability name, no website, concise description, neutral tone, precise list of affected distros (RHEL + derivatives and some EOL Fedoras) and even mention of unaffected distros (current Fedoras), plain admission that no attempt was made to exploit. What a breath of fresh air!
(I am only joking of course. As a recovering academic, I understand that researchers need recognition, and I have no right to throw stones -- glass houses and all. Also, this one is really like regreSSHion's little sibling. Still, easily finding the information I needed made me happy.)
I don't think recognition for researchers is the big win for named vulnerabilities. In the places that matter, they can just describe their findings in a short sentence and get all the recognition that matters. The names are mostly for the benefit of users.
Security researchers definitely do the naming gimmick for personal brand purposes. This may not be as obvious when it’s successful, but academic papers routinely name vulnerabilities when there is no real benefit to users.
The whole point of naming vulnerabilities is to establish a vernacular about them, so it's not surprising that academic papers name them. The literature about hardware microarchitectural attacks, for instance, would be fucking inscrutable (even more than it is now) without the names.
I'd be happy to file all of them under Spectre/MDS, except for the ones that aren't Spectre/MDS, of course. They don't all need unique names. Most of them are all instances of the same pattern: some value is not present in a register when it's needed, and an Intel CPU design continues to execute speculatively with the previous contents of that register instead of inserting a pipeline bubble, leaking the previous contents of that register. Using an inter-core communication buffer, instead of a load data buffer like the last person, I don't think deserves a new name and logo. A new write-up, yes.
I don't even understand the impulse to lose the names. Names aren't achievement awards. We already have Best Paper awards at the Big 4 and the Pwnies (for however seriously you take that). The names don't cost anybody anything, and they're occasionally helpful.
Name them all.
You see the same weird discussions about CVEs, and people wanting to squash CVEs down (or not issue them at all) because the research work is deemed insufficient to merit the recognition. As if recognition for work was ever even ostensibly what the CVE program was about.
Makes sense, but in the assembly output just after, there is not a single JMP instruction. Instead, CALL <immediate> is replaced with putting the address in a 64-bit register, then CALL <register>, which makes even more sense. But why mention the JMP thing then? Is it a mistake or am I missing something? (I know some calls are replaced by JMP, but that's done regardless of -mcmodel=large)