Re: [Json] Encoding detection (Was: Re: JSON: remove gapbetween Ecma-404 and IETF draft)

----- Original Message From: "Matt Miller (mamille2)"

> There does seem to be rough consensus that using an encoding
> other than UTF-8 can have interoperability issues.  The also
> seems to be rough consensus that the current text and table
> in section 8.1 for detecting the encoding will be inaccurate
> (and potentially harmful).
>
> That appears to mean the approach with the most consensus is
> to remove the encoding detection entirely, leaving only:
>
> """"
>    JSON text SHALL be encoded in Unicode.  The default encoding is
>    UTF-8.
> """"

I think we can be a little more helpful here.  For example, something along 
the lines of:

    JSON text is a sequence of Unicode codepoints.  The transfer encoding 
used to
    represent those characters on-the-wire is beyond the scope of this
    document.  It is therefore up to the specifications that reference this 
document to
    specify whether JSON messages will be transferred using UTF-8 
(recommended),
    UTF-16 and/or UTF-32 (discouraged), and whether preceding BOMs must be
    present, must not be present or are optional.

    If multiple encodings are permitted, implementers may choose to 
auto-detect a
    message's encoding by exploiting the fact that the first character of a 
JSON text
    must be in the ASCII character range and use the following table to 
deduce the
    active encoding:

       00 00 -- --  UTF-32BE
       00 xx -- --  UTF-16BE
       xx 00 00 00  UTF-32LE
       xx 00 00 xx  UTF-16LE
       xx 00 xx --  UTF-16LE
       xx xx -- --  UTF-8

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com

Received on Friday, 22 November 2013 19:27:30 UTC