- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 27 Nov 2013 01:14:00 +0100
- To: Nico Williams <nico@cryptonector.com>
- Cc: JSON WG <json@ietf.org>, www-tag <www-tag@w3.org>, es-discuss <es-discuss@mozilla.org>
* Nico Williams wrote: >On Tue, Nov 26, 2013 at 09:15:38PM +0100, Bjoern Hoehrmann wrote: >> * Nico Williams wrote: >> >We must not require encoding detection functionality in parsers. We >> >must not forbid it either. We might need to say that encodings other >> >than UTF-8/16/32 may not be reliably detected, therefore they are highly >> >discouraged, even forbidden except where protocols specifically call for >> >them. >> >> When I pass a fully conforming UTF-8 encoded application/json entity to >> a fully conforming JSON parser I do not want the parser to do something >> funny like interpreting the document as if it were Windows-1252 encoded. >> I am amazed how many people here think a parser that does that should >> not be considered broken. > >You missed the point. "We must require encoding detection functionality in parsers. We must forbid encoding detection functionality beyond that. We must say that encodings other than UTF-8/16/32 are forbidden in any and all cases." is how I would modify what you said above (with some caveats). Note that I am talking about labeled sequences of octets, application/ json entities, not paintings on a cave wall that look similar to JSON text in a strange font. In a labeled sequence of octets I can tell for sure whether there are invisible characters in it if I know the en- coding. There are two forms to consider. One is the labeled sequence of octets that we call "application/json entity". The other is a sequence of Uni- code scalar values. That is the alphabet of the ABNF grammar in the specification. If you have anything else, then the specification does not apply to your situation. >If you wanted to forbid non-Unicode, non-UTF encodings, then you'd be >preventing such a shell, and for what reason? If you only mean that >auto-detection of encoding should not even be mentioned, I'm fine with >that, and I've already said so earlier. Above I said that there are two forms to consider. Encoding detection is what allows us to convert the "application/json entity" form into the "sequence of Unicode scalar values" form. We need the latter form in order to apply the ABNF grammar. Imagine you receive this: HTTP/1.1 200 OK Content-Type: application/json ... ABCD... There would be at least two specifications that apply here, the HTTP and the application/json specification. Would you like them to say that you are on your own, "ABCD..." could mean anything? I would like them to say "ABCD..." is an array with three times the integer zero, like `[0,0,0]`. I can build robust software based on that. I cannot build robust software based on "well, maybe it's EBCDIC? Have you tried GB 18030? UTF-7 might be worth a try otherwise. Are you sure this matters at all?" -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Wednesday, 27 November 2013 00:14:28 UTC