Re: [Json] Encoding detection (Was: Re: JSON: removegapbetween Ecma-404 and IETF draft)

From: Pete Cordell <petejson@codalogic.com> · Date: Fri, 22 Nov 2013 19:32:46 -0000

Further to my earlier comment, I also wondered about taking a leaf out of 
cipher suites and allow specifications that use JSON to encode their 
encoding requirements along the lines of:

    JSON-8OB-16MB-32NB

where OB = Optional BOM, MB = Mandatory BOM and NB = No BOM.  So the above 
would mean UTF-8 is supported with or without BOMs, UTF-16 is supported, but 
must have a BOM and UTF-32 is supported with NO BOM.

Another example would be:

    JSON-8OB

i.e. UTF-16 and UTF-32 are not supported.

Maybe that's going too far though!

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com
----- Original Message ----- 
From: "Pete Cordell" <petejson@codalogic.com>
To: "Matt Miller (mamille2)" <mamille2@cisco.com>; "JSON WG" <json@ietf.org>
Cc: <www-tag@w3.org>; "es-discuss" <es-discuss@mozilla.org>
Sent: Friday, November 22, 2013 7:28 PM
Subject: Re: [Json] Encoding detection (Was: Re: JSON: removegapbetween 
Ecma-404 and IETF draft)

> ----- Original Message From: "Matt Miller (mamille2)"
>
>> There does seem to be rough consensus that using an encoding
>> other than UTF-8 can have interoperability issues.  The also
>> seems to be rough consensus that the current text and table
>> in section 8.1 for detecting the encoding will be inaccurate
>> (and potentially harmful).
>>
>> That appears to mean the approach with the most consensus is
>> to remove the encoding detection entirely, leaving only:
>>
>> """"
>>    JSON text SHALL be encoded in Unicode.  The default encoding is
>>    UTF-8.
>> """"
>
> I think we can be a little more helpful here.  For example, something 
> along the lines of:
>
>    JSON text is a sequence of Unicode codepoints.  The transfer encoding 
> used to
>    represent those characters on-the-wire is beyond the scope of this
>    document.  It is therefore up to the specifications that reference this 
> document to
>    specify whether JSON messages will be transferred using UTF-8 
> (recommended),
>    UTF-16 and/or UTF-32 (discouraged), and whether preceding BOMs must be
>    present, must not be present or are optional.
>
>    If multiple encodings are permitted, implementers may choose to 
> auto-detect a
>    message's encoding by exploiting the fact that the first character of a 
> JSON text
>    must be in the ASCII character range and use the following table to 
> deduce the
>    active encoding:
>
>       00 00 -- --  UTF-32BE
>       00 xx -- --  UTF-16BE
>       xx 00 00 00  UTF-32LE
>       xx 00 00 xx  UTF-16LE
>       xx 00 xx --  UTF-16LE
>       xx xx -- --  UTF-8
>
> Pete Cordell
> Codalogic Ltd
> C++ tools for C++ programmers, http://codalogic.com
> Read & write XML in C++, http://www.xml2cpp.com
>
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json