Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For any multibyte representation no however I'd say it's safe to just say "all encodings must be valid UTF-8" and require the user/service to validate that first.

There are a number of algorithms out there that can validate UTF-8 with significantly less than 1 instruction per byte. I'd imagine the overhead for pipelining the two is significantly cheaper than trying to handle the cases in the same pass.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: