- From: Pete Cordell <petejson@codalogic.com>
- Date: Fri, 22 Nov 2013 19:32:46 -0000
- To: "Matt Miller \(mamille2\)" <mamille2@cisco.com>, "JSON WG" <json@ietf.org>
- Cc: <www-tag@w3.org>, "es-discuss" <es-discuss@mozilla.org>
Further to my earlier comment, I also wondered about taking a leaf out of cipher suites and allow specifications that use JSON to encode their encoding requirements along the lines of: JSON-8OB-16MB-32NB where OB = Optional BOM, MB = Mandatory BOM and NB = No BOM. So the above would mean UTF-8 is supported with or without BOMs, UTF-16 is supported, but must have a BOM and UTF-32 is supported with NO BOM. Another example would be: JSON-8OB i.e. UTF-16 and UTF-32 are not supported. Maybe that's going too far though! Pete Cordell Codalogic Ltd C++ tools for C++ programmers, http://codalogic.com Read & write XML in C++, http://www.xml2cpp.com ----- Original Message ----- From: "Pete Cordell" <petejson@codalogic.com> To: "Matt Miller (mamille2)" <mamille2@cisco.com>; "JSON WG" <json@ietf.org> Cc: <www-tag@w3.org>; "es-discuss" <es-discuss@mozilla.org> Sent: Friday, November 22, 2013 7:28 PM Subject: Re: [Json] Encoding detection (Was: Re: JSON: removegapbetween Ecma-404 and IETF draft) > ----- Original Message From: "Matt Miller (mamille2)" > >> There does seem to be rough consensus that using an encoding >> other than UTF-8 can have interoperability issues. The also >> seems to be rough consensus that the current text and table >> in section 8.1 for detecting the encoding will be inaccurate >> (and potentially harmful). >> >> That appears to mean the approach with the most consensus is >> to remove the encoding detection entirely, leaving only: >> >> """" >> JSON text SHALL be encoded in Unicode. The default encoding is >> UTF-8. >> """" > > I think we can be a little more helpful here. For example, something > along the lines of: > > JSON text is a sequence of Unicode codepoints. The transfer encoding > used to > represent those characters on-the-wire is beyond the scope of this > document. It is therefore up to the specifications that reference this > document to > specify whether JSON messages will be transferred using UTF-8 > (recommended), > UTF-16 and/or UTF-32 (discouraged), and whether preceding BOMs must be > present, must not be present or are optional. > > If multiple encodings are permitted, implementers may choose to > auto-detect a > message's encoding by exploiting the fact that the first character of a > JSON text > must be in the ASCII character range and use the following table to > deduce the > active encoding: > > 00 00 -- -- UTF-32BE > 00 xx -- -- UTF-16BE > xx 00 00 00 UTF-32LE > xx 00 00 xx UTF-16LE > xx 00 xx -- UTF-16LE > xx xx -- -- UTF-8 > > Pete Cordell > Codalogic Ltd > C++ tools for C++ programmers, http://codalogic.com > Read & write XML in C++, http://www.xml2cpp.com > > _______________________________________________ > json mailing list > json@ietf.org > https://www.ietf.org/mailman/listinfo/json
Received on Friday, 22 November 2013 19:31:55 UTC