This is very interesting, though the limitations for 'security' reasons seem somewhat surprising to me compared to the claim "Anything JSON can do, it can do. Anything JSON can't do, it can't do.".
Simplest example, "a\u0000b" is a perfectly valid and in-bounds JSON string that valid JSON data sets may have in it. Doesn't it end up falling short of 'Anything JSON can do, it can do" to refuse to serialize that string?
"a\u0000b" ("a" followed by a vertical tabulation control code) is also a perfectly valid and in-bounds BONJSON string. What BONJSON rejects is any invalid UTF-8 sequences, which shouldn't even be present in the data to begin with.
My example was a three character string where the second one is \u0000, which is the NUL character in the middle of the string.
The spec on the GitHub says that it is banned to include NUL under a security stance, that someone that after parse someone might do strlen and accidentally truncate to a shorter string in C.
Which I think has some premise, but its a valid string contents in JSON (and in Utf8), so it is deliberately breaking 1:1 parity with JSON parity in the name of a security hypothetical.
So I think it's a very neat format, but my feedback as a random person on the Internet is that I don't think it does uphold the claimed vision in the end of being 1:1 to JSON (the security parts, but also you do end up adding extra types too) and that's a bit of a shame compared to the top line deliverable.
Just focusing narrowly on the \0 part to explain why I say so: the spec proposed is that implementations have to either hard ban embedded \0 or disallow by default with an opt in. So someone comes with a dataset that has it, they can get support in this case only if they configure both the serializer and parser to allow it. But if you're willing to exert that level of special case extra control, I think all of the other preexisting binary-json implementations that exist do meet the top line definition you are setting as well. For some binary-json implementation which has additional types, if someone is in full end to end control to special case, then they could just choose not to use those types too, the mere existence of extra types in the binary format is no extra "problem" for 1:1 than this choice.
IMO the deliverable that a 1:1 mapping would give us "there is no bonjson data that won't losslessly round trip to JSON and vice versa". The benefit is when it is over all future data that you haven't seen yet, where the downside of using something that is not bijective is that you run for a long time suddenly you have data dependent failures in your system because you can't 1:1 map legal data.
And especially with this guarantee, what will inevitably happen is some downstream handling will also take as a given that they can strlen() since they "knew" the bonjson format spec banned it, so suddenly when you have it as in-bounds data you also won't be able to trivially flip the switch, instead you are stuck with legal JSON that you can't ingest in your system without an expensive audit because the reduction from 1:1 gets entrenched as an invariant into the handling code.
Note that my vantage point might be a bit skewed here: I work on Protobuf and this shape of ecosystem interoperability topics are top of mind for me in ways that they don't necessarily need to be for small projects, and I also recognize that "what even is legal JSON" itself is not actually completely clear, so take it all with a grain of salt (and again, I also do think it looks like a very nice encoding in general).
Oh yes, I do understand what you're getting at. I'm willing to go a little off-script in order to make things safer. The NUL thing can be configured away if needed, but requires a conscious decision to do so.
Friction? yeah, but that's just how it's gonna be.
For the invalid Unicode and duplicate key handling, I'll offer no quarter. The needs of the many outweigh the needs of the few.
Can you tell me what was the context that lead you to create this?
Unrelated JSON experience:
I worked on a serializer which save/load json files as well as binary file (using a common interface).
From my own use case I found JSON to be restrictive for no benefit (because I don't use it in a Javascript ecosystem)
So I change the json format into something way more lax (optional comma, optional colon, optional quotes, multi line string, comments).
I wish we would stop pretending JSON to be a good human-readable format outside of where it make sense and we would have a standard alternative for those non-json-centric case.
I know a lot of format already exists but none really took off so far.
Basically, for better or worse JSON is here to stay. It exists in all standard libraries. Swift's codec system revolves around it (it only handles types that are compatible with JSON).
It sucks, but we're stuck with JSON. So the idea here is to make it suck a little less by stopping all this insane text processing for data that never ever meets a human directly.
The progression I envisage is:
1. Dev reaches for JSON because it's easy and ubiquitous.
2. Dev switches to BONJSON because it's more efficient and requires no changes to their code other than changing the codec library.
3. Dev switches to a sane format after the complexity of their app reaches a certain level where a substantial code change is warranted.
I'm in the JS ecosystem pretty regularly and "restrictive with no benefit" is the right description. I use JSON5 now when I have to, which greatly reduces the restrictions. I already have a build step so throwing in a JSON5 -> JSON converter is negligible.
As for FracturedJson, it looks great. The basic problem statement of "either minified and unreadable or prettified and verbose" isn't one I had put my finger on before, but now that it's been said I can't unsee it.
Have you heard of EDN? It's mostly used in Clojure and ClojureScript, as it is to Clojure what JSON is to JS.
If you need custom data types, you can use tagged elements, but that requires you to have functions registered to convert the data type to/from representable values (often strings).
It natively supports quite a bit more than JSON does, without writing custom data readers/writers.
Another thing to possibly consider would be ASN.1 (you can also use the nonstandard extensions that I made up, called ASN.1X, if you want some of the additional types I included such as a key/value list). (You are not required to implement or use all of the types or other features of ASN.1 in your programs; only use the parts that you use for your specific application.) Unlike EDN, ASN.1 has a proper byte string type, it is not limited to Unicode, it has a clearly defined canonical form (DER, which is probably the best format (and is the format used by X.509 certificates); BER is too messy), etc. DER is a binary format (and the consistent framing of different types in DER makes it easier to implement and work with than the formats that use inconsistent framing, although that also makes it less compact); I made up a text format called TER, which is intended to be converted to DER.
That's neat, but I'm much more intrigued by your Concise Encoding project[1]. I see that it only has a single Go reference implementation that hasn't been updated in 3 years. Is the project still relevant?
I'm actually having second thoughts with Concise Encoding. It's gotten very big with all the features it has, which makes it less likely to be adopted (people don't like new things).
I use ASN.1X, so I use some types that those other formats do not have. Some of the types of ASN.1 are: unordered set, ISO 2022 string, object identifier, bit string. I added some additional types into ASN.1X, such as: TRON string, rational numbers, key/value list (with any types for keys and for values (and the types of keys do not necessarily have to match); for one thing, keys do not have to be Unicode), and reference to other nodes. However, ASN.1 (and ASN.1X) does not distinguish between qNaN and sNaN. I had also made up TER, which is a text format that can be converted to DER (like how ORT can be converted to ORB, although its working is differently, and is not compatible with JSON (TER somewhat resembles PostScript)).
Your extensions of JSON with comments, hexadecimal notation, optional commas, etc is useful though (my own program to convert JSON to DER does treat commas as spaces, although that is an implementation detail).
Probably, we need a formal data format, because JSON is just a notation. It does not mandate the bit width of numbers, for example, or whether ints are different from floats. Once there is such formal model, we can map it 1:1 between representations.
I think JSON is too limited and has some problems, so BONJSON has mostly the same problems. There are many other formats as well, some of which add additional types beyond JSON and some don't. Also, a few programs may expect (and possibly require) that files may contain invalid UTF-8, even though it is not proper JSON (I think it would be better that they should not use JSON, due to this and other issues), so there is that too. Using normalized Unicode has its own problems, as does allowing 64-bit integers when some programs expect it and others don't. JSON and Unicode are just not good formats, in general. (There is also a issue with JSON.stringify(-0) but that is an issue with JavaScript that does not seem to be relevant with BONJSON, as far as I can tell.)
Nevertheless, I believe your claims are mostly accurate, except for a few issues with which things are allowed or not allowed, due to JavaScript and other things (although in some of these cases, the BONJSON specification allows options to control this). Sometimes rejecting certain things is helpful, but not always; for example sometimes you do want to allow mismatched surrogates, and sometimes you might want to allow null characters. (The defaults are probably reasonable, but are often the result of a bad design anyways, as I had mentioned above.) Also, the top of the specification says it is safe against many attacks, but these are a feature of the implementation, which would also be the case if you are implement JSON or other formats (although the specification for BONJSON does specify that implementations are supposed to check for these things to make them safe).
(The issue of overlong UTF-8 encodings in IIS web servers is another security issue, which is using a different format for validation and for usage. In this case there are actually two usages though, because one of these usages is the handling of relative URLs (using the ASCII format) and the other is the handling of file names on the server (which might be using UTF-16 here; in addition to that is the internal format of the file paths into individual pieces with the internal handling of relative file paths). There are reasons to avoid and to check for overlong UTF-8 encodings, although this is a different more general issue than the character encoding.)
Another issue is canonical forms; the canonical form of JSON can be messy, especially for numbers (I don't know what the canonical form for numbers in JSON is, but I read that apparently it is complicated).
I think DER is better. BONJSON is more compact but that also makes the framing more complicated to handle than DER (which uses consistent framing for all types). I also wrote a program to convert JSON to DER (I also made up some nonstandard types, although the conversion from JSON to DER only uses one of these nonstandard types (key/value list); the other types it needs are standard ASN.1 types). Furthermore, DER is already canonical form (and I had made up SDER and SDSER for when you do not want canonical form but also do not want the messiness of BER; SDSER does have chunking and does not require the length to be known ahead of time, so more like BONJSON in these ways). Because of the consistent framing, you can easily ignore any types that you do not use; even though there are many types you do not necessarily need all of them.
Yup, and that's perfectly valid. I'm OK with BONJSON not fitting everyone's use case. For me, safety is by far more important than edge cases for systems that require bad data representations. Anyone who needs unsafe things can just stick with JSON (or fix the underlying problems that led to these requirements).
Safe, sane defaults, and some configurability for people who (hopefully) know what they're doing. Falling into success rather than falling into failure.
It's not the end-all-be-all of data formats; it's just here to make the JSON pipeline suck less.
JSON implementations can be made just as safe, but the issue is that unsafe JSON implementations are still considered valid implementations (and so almost all JSON implementations are unsafe because nobody is an authority on which design is correct). Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.
> Safe, sane defaults, and some configurability for people who (hopefully) know what they're doing.
Yes, I agree (if you want to use it at all, which as I have mentioned you should consider if you should not use JSON or something related), although some of the things that you specify as not having options will make it more restrictive than JSON will be, even if those restrictions might be reasonable by default. One of these is mismatched surrogates (although matched surrogates should always be disallowed, an option to allow mismatched surrogates should be permitted (but not required)). Also, I think checking for duplicate names probably should not use normalized Unicode. Furthermore, the part that says that names MUST NOT be null seems redundant to me, since it already says that names MUST be strings (for compatibility with JSON) and null is not a string.
> Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.
OK, this is a valid point, although there is still the possibility of incorrect implementations (adding test cases would help with that problem, though).
Up until now I didn't care how my software was installed, but snaps REALLY don't play nice, so it's time to retire them. Canonical has lost this battle, and the sooner they accept it and move on, the sooner they can recover their reputation and put this madness behind them.
I just turn off trackpads, I'm not interested in that kind of input device, and any space dedicated to one is wasted to me. I use nibs exclusively (which essentially restricts me to Thinkpads).
My arms rest on the body, the last thing I want is for it to be a material that leeches heat out of my body or that is likely to react with my hands' sweat and oils.
There's a lot of editorializing going on. Now that the title has been restored, hopefully things calm down a bit.
Ultimately, Tesla has two problems going on here:
1. Their crash rate is 2x that of Waymo.
2. They redact a lot of key information, which complicates safety assessments of their fleet.
The redactions actually hurt Tesla, because the nature of each road incident really matters: EVERY traffic incident must be reported, regardless of fault (even if it's a speeding car from the other direction that hits another vehicle which then hits the robotaxi - yes, that's actually in one of the Waymo NHTSA incident reports). When Tesla redacts the way they've been doing, it makes it very difficult to do studies like https://www.tandfonline.com/doi/full/10.1080/15389588.2025.2... which show how much safer Waymo vehicles are compared to humans WHEN IT COMES TO ACTUAL DAMAGE DONE.
We can't get that quality of info from Tesla due to their redaction practices. All we can reliably glean is that Tesla vehicles are involved in 2x the incidents per mile compared to Waymo. https://ilovetesla.com/teslas-robotaxi-dilemma-navigating-cr...
The redactions also indicate they are hiding something and deprioritize safety below (whatever they are hiding). That is, it makes Tesla not trustworthy with something that risks life and limb.
If you only count robotaxis and not all Tesla, isn't the crash rate 20X per driven mile? I remember doing the math a few months ago and finding 20, but I might be mistaken.
America is, and always has been, an isolationist nation - behaving like an island nation even though it's not an island (although it might as well be, given it has only two neighbors of little consequence).
It was dragged into the first world war (despite strong public aversion) because J.P. Morgan Jr started lending money to Britain and France to buy American steel, thus setting in motion a cycle of investment and production protection that eventually required boots on the ground.
It was dragged into the second world war by Japan's attack on Pearl Harbor (and Germany's subsequent declaration of war, as it was obligated to do under its treaty with Japan).
It protected Europe and SE Asia in the post-war years in order to contain communism, which it feared more than anything else. Once that threat subsided, there wasn't much reason for it to continue with its overseas footprint other than inertia and protecting important trade routes.
Gulf Wars I was to protect oil prices (and because they already had the equipment for war), and Gulf Wars II was to be seen to be doing something about 9/11.
Now that Trump is in power, America is performing its "great reset" (which was going to come eventually), where it becomes isolationist again, sticking to the Americas (reinvigorating the Monroe Doctrine), and leaving everyone else to their own devices.
For the 80% of use cases, you have homogeneous build commands that are the same across projects (such as a makefile with build, clean, test, etc). This calls the real (complex) build system underneath to actually perform the action. You shouldn't need to type more than 15 keys to make it do common things (and you CERTAINLY shouldn't need to use ANY command line switches).
Then for the other 20% of (complex) use cases, you call the underlying build system directly, and have a document describing how the build system works and how to set up the dev environment (preferably with "make dev-env"). Maybe for self-bootstrapping systems like rust or go this isn't such a big deal, but for C/C++ or Python or node or Java or Mono it quickly becomes too bespoke and fiddly.
Then you include tests for those makefile level commands to make sure they actually work.
There's nothing worse than having to figure out (or remember) the magical incantation necessary to build/run some project among the 500 repos in 15 languages at a company, waiting for the repo owner to get back to you on why "./gradlew compileAndRun" and "/.gradlew buildAndRun" and "./gradlew devbuild" don't work - only to have them say "Oh, you just use ./gradlew -Pjava.version=11 -Dconfig.file=config/dev-use-this-one-instead.conf -Dskipdeploy buildAndDeploy - oh and make sure ImageMagick and Pandoc are installed. They're only used by the reports generator, but buildAndDeploy will error out without them". Wastes a ton of time.
Yes. In the example of gradle I setup all specifics to the well know lifecycle tasks: check, assemble and in some cases publish.
Some projects are more complicated specifically when you can really use the rule of: 1 project one assembly. See android with apk vs bundle.
Here you may need more specific tasks. But I try to bind CI (be it Jenkins or GitHub actions) to only know the basic interface.
But I meant specifically the believe that build systems and tooling around is too complicated and unnecessary.
Ah yes. Unfortunately the complexity is necessary in modern codebases. There are usually ways to simplify, but only to a point - after that all you're doing is smearing the complexity around rather than containing it.
These online storage services like iCloud and Google Drive are, and always have been, a trap.
They feel convenient, but they will keep changing their TOS to disadvantage you further and further as time goes on.
Everything you upload is scanned into their AI to create a profile about you that they can then exploit (once again, to your disadvantage). They do it despite regulations against it (Who's to say what they're complying with, deep in their complex data centers? Who's gonna even check? And how?) This is why online services that take control of your data are such gold mines (subscription fees, analytics, profiling, etc). They get you coming and going.
And of course, the account terminations: The earthquakes and "natural disasters" of the online world that destroy lives with no consequence or care.
When your data is not in your sole possession, you own nothing.
> Assuming both parties can come up with unbiased random numbers
When you're in competition, this cannot be assumed. You'll each bias the numbers you come up with towards your preferred outcome. Even with A + B mod N, you can still bias the results when you know what your opponent is trying for.
A fairer approach would be to make a long series of randomized values. Your opponent secretly chooses a starting offset, and you pick an offset to add.
I've also been working in the other direction, making JSON more machine-readable:
https://github.com/kstenerud/bonjson/
It has EXACTLY the same capabilities and limitations as JSON, so it works as a drop-in replacement that's 35x faster for a machine to read and write.
No extra types. No extra features. Anything JSON can do, it can do. Anything JSON can't do, it can't do.
reply