- From: Pete Cordell <petejson@codalogic.com>
- Date: Fri, 22 Nov 2013 19:32:46 -0000
- To: "Matt Miller \(mamille2\)" <mamille2@cisco.com>, "JSON WG" <json@ietf.org>
- Cc: <www-tag@w3.org>, "es-discuss" <es-discuss@mozilla.org>
Further to my earlier comment, I also wondered about taking a leaf out of
cipher suites and allow specifications that use JSON to encode their
encoding requirements along the lines of:
JSON-8OB-16MB-32NB
where OB = Optional BOM, MB = Mandatory BOM and NB = No BOM. So the above
would mean UTF-8 is supported with or without BOMs, UTF-16 is supported, but
must have a BOM and UTF-32 is supported with NO BOM.
Another example would be:
JSON-8OB
i.e. UTF-16 and UTF-32 are not supported.
Maybe that's going too far though!
Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com
----- Original Message -----
From: "Pete Cordell" <petejson@codalogic.com>
To: "Matt Miller (mamille2)" <mamille2@cisco.com>; "JSON WG" <json@ietf.org>
Cc: <www-tag@w3.org>; "es-discuss" <es-discuss@mozilla.org>
Sent: Friday, November 22, 2013 7:28 PM
Subject: Re: [Json] Encoding detection (Was: Re: JSON: removegapbetween
Ecma-404 and IETF draft)
> ----- Original Message From: "Matt Miller (mamille2)"
>
>> There does seem to be rough consensus that using an encoding
>> other than UTF-8 can have interoperability issues. The also
>> seems to be rough consensus that the current text and table
>> in section 8.1 for detecting the encoding will be inaccurate
>> (and potentially harmful).
>>
>> That appears to mean the approach with the most consensus is
>> to remove the encoding detection entirely, leaving only:
>>
>> """"
>> JSON text SHALL be encoded in Unicode. The default encoding is
>> UTF-8.
>> """"
>
> I think we can be a little more helpful here. For example, something
> along the lines of:
>
> JSON text is a sequence of Unicode codepoints. The transfer encoding
> used to
> represent those characters on-the-wire is beyond the scope of this
> document. It is therefore up to the specifications that reference this
> document to
> specify whether JSON messages will be transferred using UTF-8
> (recommended),
> UTF-16 and/or UTF-32 (discouraged), and whether preceding BOMs must be
> present, must not be present or are optional.
>
> If multiple encodings are permitted, implementers may choose to
> auto-detect a
> message's encoding by exploiting the fact that the first character of a
> JSON text
> must be in the ASCII character range and use the following table to
> deduce the
> active encoding:
>
> 00 00 -- -- UTF-32BE
> 00 xx -- -- UTF-16BE
> xx 00 00 00 UTF-32LE
> xx 00 00 xx UTF-16LE
> xx 00 xx -- UTF-16LE
> xx xx -- -- UTF-8
>
> Pete Cordell
> Codalogic Ltd
> C++ tools for C++ programmers, http://codalogic.com
> Read & write XML in C++, http://www.xml2cpp.com
>
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json
Received on Friday, 22 November 2013 19:31:55 UTC