More

exyi · 2026-01-13T09:32:51 1768296771

Local would imply the date is in the current machine timezone, while PlainDateTime is zoneless. It may be in the server timezone, or anything else. The main difference is that it does not make sense to convert it to Instant or ZonedDateTime without specifying the timezone or offset

exyi · 2026-01-06T12:04:45 1767701085

Only until you work with a type array (Int32Array, Float64Array, etc), then it becomes 10x slower: https://jsperf.app/doyeka/11

exyi · 2025-12-10T19:43:42 1765395822

Usually yes, but it's still a neat trick to be aware of. For interpreted scripting languages, parsing can actually be a significant slowdown. Even more so when we start going into text-based network protocols, which also need a parser (is CSS a programming language or a network protocol? :) )

exyi · 2025-11-17T16:07:15 1763395635

The point is that a good library usually exists for some language, which is not necessarily the one you are currently using.

IMHO, we don't lack good libraries in XY, we are lacking good interop. Going through REST or stdio is quite painful just to render PDF (or export spreadsheet, ...)

cm2187 · 2025-11-17T19:06:28 1763406388

cough cough... COM

exyi · 2025-11-05T20:31:25 1762374685

C# portable SIMD is very nice indeed, but it's also not usable without unsafety. On the other hand, Rust compiler (LLVM) has a fairly competent autovectorizer, so you may be able to simply write loops the right way instead of the fancy API.

buybackoff · 2025-11-05T21:52:55 1762379575

Unsafety means different things. In C#, SIMD is possible via `ref`s, which maintains GC safety (no GC holes), but removes bounds safety (array length check). The API is called appropriately Vector.LoadUnsafe

bencyoung · 2025-11-06T09:59:13 1762423153

Having worked in HPC a fair bit I'm not a fan of autovectorization. I prefer the compiled code's performance to be "unsuprising" based on the source and to use vectors etc where I know it's needed. I think in general it's better to have linting that points out performance issues (e.g. lift this outside the loop) rather than have compilers do it automatically and make things less predictable

Tuna-Fish · 2025-11-06T15:51:47 1762444307

You can write good autovectorized code in Rust today, but only for integers. Since Rust lacks --ffast-math, the results on most fp code are disappointing.

neonsunset · 2025-11-05T21:33:20 1762378400

You are not "forced" into unsafe APIs with Vector<T>/Vector128/256/512<T>. While it is a nice improvement and helps with achieving completely optimal compiler output, you can use it without unsafe. For example, ZLinq even offers .AsVectorizable LINQ-style API, where you pass lambdas which handle vectors and scalars separately. It the user code cannot go out of bounds and the resulting logic even goes through (inlined later by JIT) delegates, yet still offers a massive speed-up (https://github.com/Cysharp/ZLinq?tab=readme-ov-file#vectoriz...).

Another example, note how these implementations, one in unsafe C# and another in safe F# have almost identical performance: https://benchmarksgame-team.pages.debian.net/benchmarksgame/..., https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

exyi · 2025-06-27T06:30:43 1751005843

The protocol must support it somehow already, as some bridges can send custom emojis from other platforms

Arathorn · 2025-06-27T10:47:04 1751021224

https://github.com/matrix-org/matrix-spec-proposals/pull/254... is the de-facto way that custom emoji are done today, but as you can see from the labyrinthine Matrix Spec Change, it still needs some work to get stabilised and formally merged into the spec. Meanwhile, folks are always welcome to use experimental MSCs in the wild.

exyi · 2025-06-10T16:33:23 1749573203

Everyone I know of will try to click "reject all unnecessary cookies", and you don't need the dialog for the necessary ones. You can therefore simply remove the dialog and the tracking, simplifying your code and improving your users' experience. Can tracking the fraction which misclicks even give some useful data?

yeahforsureman · 2025-06-11T04:44:07 1749617047

My point was that according to the current interpretation, if they rely on cookies, user analytics (even simple visitor stats where no personal data is actually processed) are not considered "necessary" and are therefore not exempt from the cookie consent obligation under the ePrivacy Directive. The reason why personal data processing is irrelevant is that the cookie consent requirement itself is based on the pre-GDPR ePrivacy Directive which requires, as a rule, consent merely for saving cookies on the client device (subject to some exceptions, including the one discussed).

So you need a consent for all but the most crucial cookies without which the site/service wouldn't be able to function, like session cookies for managing signed-in state etc.

(The reason why you started to see consent banners really only after GDPR came to force is at least in part due to the fact that the ePrivacy Directive refers to the Data Protection Directive (DPD) for the standard of consent, and after DPD was replaced by GDPR, the arguably more stringent GDPR consent standard was applied, making it unfeasible to rely on some concept of implied consent or the like.)

mhitza · 2025-06-11T09:25:33 1749633933

User analytics that require cookies, sounds like tracking to me.

> like session cookies for managing signed-in state etc.

Maybe I'm reading it wrong, but are you saying that consent is required for session cookies? Because that is not the case, at all.

> (25) However, such devices, for instance so-called "cookies", can be a legitimate and useful tool, for example, in analysing the effectiveness of website design and advertising, and in verifying the identity of users engaged in on-line transactions. Where such devices, for instance cookies, are intended for a legitimate purpose, such as to facilitate the provision of information society services, their use should be allowed on condition that users are provided with clear and precise information in accordance with Directive 95/46/EC about the purposes of cookies or similar devices so as to ensure that users are made aware of information being placed on the terminal equipment they are using. Users should have the opportunity to refuse to have a cookie or similar device stored on their terminal equipment. This is particularly important where users other than the original user have access to the terminal equipment and thereby to any data containing privacy-sensitive information stored on such equipment. Information and the right to refuse may be offered once for the use of various devices to be installed on the user's terminal equipment during the same connection and also covering any further use that may be made of those devices during subsequent connections. The methods for giving information, offering a right to refuse or requesting consent should be made as user-friendly as possible. Access to specific website content may still be made conditional on the well-informed acceptance of a cookie or similar device, if it is used for a legitimate purpose.

https://eur-lex.europa.eu/eli/dir/2002/58/oj/eng

You should inform users about any private data you would be storing in a cookie. But this can be a small infobox on your page with no button.

When storing other type of information, the "cookie" problem needs to be seen from the perspective of shared devices. You know, the times before, when you might forget to log out at an internet cafe or clear your cookies containing password and other things they shouldn't. This is a dated approach at looking at the problem (most people have their own computing devices today, their phone), but still applicable (classrooms, and family shared devices).

exyi · 2025-05-22T08:16:45 1747901805

Kotlin did not have open LSP, C# still does not have an open debugger.

The C# VSCode extension works in Microsoft's build of VSCode, not when someone else forks it and builds it themselves.

jeroenhd · 2025-05-22T14:00:09 1747922409

The debugger is proprietary but still works cross-platform. I don't know how Jetbrains does C# debugging in Rider exactly, but that shows that you don't have to use VS (Code) to do C# development if you don't want to.

Thanks to Samsung of all companies, there's an open source C# debugger on GitHub (https://github.com/Samsung/netcoredbg). That seems to be the basis of the open source C# extension's debugging capabilities: https://open-vsx.org/extension/muhammad-sammy/csharp

The VSCodium C# community wants Microsoft to open source their debugger instead of having to maintain an open source version themselves, but that doesn't mean you need to use Microsoft's open source version. If anything, this forceful separation makes it so that there never will be only one implementation (like there is for languages like Rust which have always been open and therefore only have one way of doing things).

exyi · 2025-05-23T08:36:43 1747989403

I know about netcoredbg, but I did not have much success using it. If we count this as the C# debugger, then the tooling quality is not comparable to other mainstream languages like Scala, D or Julia.

JetBrains have their own closed debugger, which doesn't really help.

Since Rust is native code, you can use pretty much any debugger for it, there is definitely not a single implementation. Yes, Rust has a single compiler, but does C# have any other compiler than Microsoft's Roslyn? (I don't think this is a problem, though)

martypitt · 2025-05-22T08:21:29 1747902089

Kotlin has had an OSS (MIT) Language Server for years. It's written and maintained by the community - but isn't that exactly the point of open source?

[0]: https://github.com/fwcd/kotlin-language-server

exyi · 2025-04-05T16:45:16 1743871516

It corresponds to a way more than one branch at instruction level. The branch prediction AFAIK does not care based on what are you branching, it just assumes branches will go in similar sequences as they did last time. If the Python 'if' is never taken, the instruction-level predictor will learn that after the comparison operation, there is an 'if' operation and then another array access operation. If the Python 'if' is unpredictable unpredictable, CPU predictor can't be sure which opcode are we processing after 'if', so there will be penalty.

mardifoufs · 2025-04-05T16:49:50 1743871790

Is there any public documentation on modern branch prediction algorithms? I know branch prediction is very important to modern CPU so SOTA techniques are probably not public... But it's really amazing what it can do especially considering the "limited" cache sizes that branch predictors have .

exyi · 2025-04-05T18:20:46 1743877246

I guess it depends on how deep you want to go, I think the real predictors are based on publicly known algorithms such as TAGE. This seems to be nice overview: https://comparch.net/2013/06/30/why-tage-is-the-best/ (it's 2013, so definitely not SOTA, but advanced enough for my taste :) )

exyi · 2025-04-05T14:29:55 1743863395

It is! Although my test case is probably an unrealistically bad scenario:

It's the classic, why is processing sorted array faster than unsorted one

    def f(arr):
        r = True
        for x in arr:
           if x > 128: r = not r
        return r

    arr = [ random.randint(0, 255) for i in range(0, 1000_000) ]
    arr_sorted = list(sorted(arr))

    %timeit f(arr)
    %timeit f(arr_sorted)

Results are (on my machine): 17.5 ms for unsorted, and 13.5 ms for sorted. For comparison, in a compiled language, the difference is 4x

ammar2 · 2025-04-05T14:43:04 1743864184

Edit: Analyzed the wrong thing earlier.

This depends on the Python version, but if it has the specializing interpreter changes, the `COMPARE_OP` comparing the integers there is probably hitting a specialized `_COMPARE_OP_INT` [1].

This specialization has a ternary that does `res = (sign_ish & oparg) ? PyStackRef_True : PyStackRef_False;`. This might be the branch that ends up getting predicted correctly?

Older versions of Python go through a bunch of dynamic dispatch first and then end up with a similar sort of int comparison in `long_richcompare`. [2]

[1] https://github.com/python/cpython/blob/561965fa5c8314dee5b86...

[2] https://github.com/python/cpython/blob/561965fa5c8314dee5b86...

dzaima · 2025-04-05T14:52:00 1743864720

This isn't actually timing the sorting, but just the (dumb) function f.

ammar2 · 2025-04-05T15:07:17 1743865637

Oh whoops, that's right. I totally missed that.

LPisGood · 2025-04-05T16:40:29 1743871229

This is a really good example. It is more like branch prediction than standard data/instruction caching.

I wonder if you could do Spectre type vulnerabilities in python. You would need some way to leak micro-architectural state, so without being particularly clever maybe python code could only be used as a gadget or something.

cma · 2025-04-05T15:10:26 1743865826

Python speed up is probably from small integer caching, a sorted array will have runs of pointers to the same integers adjacent. The compiled language one is probably branch prediction right?

exyi · 2025-04-05T16:37:47 1743871067

I intentionally stayed in the small integer range to avoid benchmarking the cache. 256 distinct values should fit into L1 just fine in both cases.

I'm now thinking that the difference might be even larger if we instead avoid small integers and let the CPU get stuck chasing pointers. The idea is that it gets stuck on a memory access, which forces it to speculate much further, which in turn makes it backtrack a longer path if a branch was mispredicted. I'm obviously no expert on this, feel free to correct me

The results for 1B range instead of 255 are 17.6 ms for unsorted / 68.2 ms for sorted! We are back to what the original article observed and it's a way stronger effect than what branch prediction can offer. So don't sort your arrays, keep them in the order the boxed values were allocated ;)

cma · 2025-04-05T17:02:53 1743872573

How big is the pointed to small integer? With alignment etc. I'm seeing some stuff saying 256 of them would fill an 8KB L1. Plus other stuff for the interpreter might overfill it. Sorted that would be less of an issue.

Larger range one being slower unsorted yes makes sense because of allocation order no longer matching the iteration order.

exyi · 2025-04-05T18:07:42 1743876462

I don't know how large are those boxes, but normal CPU L1 cache has 32 or 48KB which should be plenty for this. Python opcodes for this program are going to be tiny, and the interpreter itself uses the instruction-L1 cache (which is another 32-48KB). I hope the sequential scan of the big array won't flush the L1 cache (there should be 12-way associativity with LRU, so I don't see how it could).

Anyway, there is no need to have 256 integers, just 2 is enough. When I try that, the results are similar: 17.5 ms (unsorted) / 12.5 ms (sorted)

bgwalter · 2025-04-05T16:46:11 1743871571

That seems very likely. The benchmark should probably use a range that is guaranteed to be outside of the cached smallint range.

exyi · 2025-04-05T18:24:23 1743877463

Then you are back to what the article discusses. Each integer is in a separate box, those boxes are allocated in one order, sorting the array by value will shuffle it by address and it will be much slower. I tested this as well, see the other comment.