Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

yes hack yourself. bravo :)

I guess you would have to sanitize when you save and/or load the spreadsheet



> I guess you would have to sanitize when you save and/or load the spreadsheet

Sanitizing? No chance. Either you have a dedicated expression parser, or you run it directly throgh eval. There is no reliable middle ground. Decades of security failures of so-called "sanitizers" show this pretty clearly.

(Even if you manage to create a perfect sanitizer today, wait a few months, new features are added to the browser, and new loopholes will appear out of nothing.)

But that may be missing the point, because if you want more code quality, more safety and more features, of course you need more code. This demo illustrates the other way around: If you allow for dirty hacks, you can get away with a surprisingly small amount of code.


Blacklisting (checking that the input doesn’t contain any of a fixed set of known troublemakers) is asking for trouble, but whitelisting (checking that the input doesn’t contain anything but a fixed set of known safe constructs) should be fine.

If your whitelist allows a wide range of constructs, it isn’t much easier to check that an input is in the allowed set than to write an evaluator that is limited to that set, so it may not be much of an advantage to have a more powerful ”eval” lying around.


Is there really no middle ground? Sanitizers fail because they try to salvage the clean part, only blacklisting some possible inputs. But what if you turn it around? Only send to eval what fits through a matcher for a very small subset of the language. The matcher can even allow invalid inputs if you know that eval will safely reject them (think unbalanced brackets). That matcher will be much easier and safer to implement than a full parser/interpreter for the same subset.


> Only send to eval what fits through a matcher for a very small subset of the language

That's exactly what I meant by "dedicated expression parser".

(Not sure why you name it "matcher", though. Please be aware that a regex-based matcher will almost certainly fail for that task. You usually want a grammar, i.e. parser, which is more powerful, and shorter, and easier to read and to verify.)

EDIT: To those who downvoted my clarification, do you care to elaborate?


There is a difference between a recogniser, which answers the question "does this belong to the language", and a parser, which outputs a data structure. All you need here is a recogniser, and then pass the string through to eval which will do it's own parsing. Recognisers are smaller than parsers.

If you relax the rules, as the gp said, you can get away with something like a regex to do the job. While regex's are bad at context free grammars [0], if you forgo balancing brackets etc. a regex will do just fine.

All that said, with the crazy things JS lets you do [1] a recogniser for a relaxed language is likely to still let potentially dangerous code though.

[0] Yes, with most regex engines you can parse CFGs, but it's not nice, and at that point you _do_ want a grammar based parser

[1] http://www.jsfuck.com/


> difference between a recogniser

Please note that the term "recogniser" is very fuzzy, it could mean a regex matcher, a parser or even a turing-complete thing. Not very helpful for this discussion.

> a parser, which outputs a data structure

Please note that a parser is not required to output a data structure. In classic computer science, the parser of a context-free grammar usually has a minimal (boolean) output: it just either accepts or rejects the input.

If your "recognizer" is too weak (e.g. regexes), you risk not properly checking the language (see below).

If your "recognizer" is too powerful (e.g. turing complete), you risk tons of loopholes which are hard to find and hard to analyze. You probably won't be able to prove the security, and even if you do, it will probably be hard work, and even harder for others to follow and to verify.

But if your "recognizer" is a parser, you have a good chance to succeed in a safe way with minimal effort. Proving security is as simple as comparing your grammar with the ECMAscript standard.

> you can get away with something like a regex to do the job [...] with the crazy things JS lets you do a recogniser for a relaxed language is likely to still let potentially dangerous code though

That's exactly my point: Sure, you can try to build a protection wall based on regexes, but there's no reason to do that. Use a proper parser right away and don't waste your time with repeating well-known anti-patterns.


> In classic computer science, the parser of a context-free grammar usually has a minimal (boolean) output: it just either accepts or rejects the input.

All of the literature I remember from my uni days on formal grammers had a recogniser defined as something that accepts/rejects, and a parser as something that builds a data structure.

It's difficult to retrospectively find the literature, because outside of formal grammars recogniser _is_ used more loosely. But a few Wikipedia articles [1] [2] [3] and their referenced literature [4] [5] do agree with me.

> A recognizer is an algorithm which takes as input a string and > either accepts or rejects it depending on whether or not the string > is a sentence of the grammar. A parser is a recognizer which also > outputs the set of all legal derivation trees for the string.

[1] https://en.wikipedia.org/wiki/Parsing#Computer_languages

[2] https://en.wikipedia.org/wiki/Earley_parser#Earley_recognise...

[3] https://en.wikipedia.org/wiki/CYK_algorithm#Generating_a_par...

[4] http://reports-archive.adm.cs.cmu.edu/anon/anon/usr/ftp/scan...

[5] http://bat8.inria.fr/~lang/papers/harder/harder.pdf


> EDIT: To those who downvoted my clarification, do you care to elaborate?

you are probably making too much sense.


> Either you have a dedicated expression parser, or you run it directly throgh eval. There is no reliable middle ground.

While there is no safe middle ground, using eval directly is the worst case; it's not a case where those extremes reliable and there is greater danger in between.

That being said, rejecting everything that fails an expression parser is a form of sanitization.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: