Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft) from Nico Williams on 2013-11-27 (www-tag@w3.org from November 2013)

From: Nico Williams <nico@cryptonector.com>
Date: Tue, 26 Nov 2013 18:11:31 -0600
To: Carsten Bormann <cabo@tzi.org>
Cc: JSON WG <json@ietf.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, www-tag <www-tag@w3.org>, es-discuss <es-discuss@mozilla.org>
Message-ID: <20131127001127.GI21240@localhost>

On Wed, Nov 27, 2013 at 12:20:25AM +0100, Carsten Bormann wrote:
> On 27 Nov 2013, at 00:07, Nico Williams <nico@cryptonector.com> wrote:
> > Do you want to say anything about other encodings?  What would that be?
> 
> JSON is encoded in UTF-8.
> 
> There is no need to discuss JSON in other encodings, because it
> wouldn’t be JSON.

Thanks.

My opinion as to MIME contexts:

    I'm not opposed to saying that the application/json media type
    requires UTF-8.  Others have objected, and I believe the WG
    consensus to be that the application/json media type allows all of
    UTF-8/16/32.

    I believe we should settle for an interop note noting that UTF-8 has
    the best interoperability, and a recommendation that UTF-8 be used.

My opinion as to non-MIME contexts:

    I'm not opposed to recommending that JSON texts for interchange in
    non-MIME contexts be encoded in UTF-8, and I'm not opposed to
    requiring that use of any other encoding be expressed as metadata.

    I do object to requiring that under all circumstances -even in
    non-MIME contexts- UTF-8 must be used.

> (And no, I see no need to handle UTF-16LE, UTF-16BE, UTF-32LE or
> UTF-32BE in any special way, even if RFC 4627 was written at a time
> when it still seemed useful to pay them lip service.  But I recognize
> that there appears to be WG consensus to keep these corpses on life
> support, maybe because UTF-16 is the internal encoding of the
> programming language that gave JSON its name.)

Right, that appears to be the consensus, and more than that, it seems
extremely unlikely to change.

Assuming *that*, what are you willing to settle for?

Nico

PS: Back to my hypo...

    If my hypothetical JSON-using shell were to escape all non-ASCII
    characters in JSON string values, then encode the JSON text in
    UTF-8, then convert the result to the current locale's codeset
    (doing the reverse to parse), and the resulting texts either never
    leak to other locales, why should anyone care?

    Most (but not all) non-Unicode locales use ASCII-compatible codesets,
    thus the result would be "proper" JSON texts in most cases anyways...

    As to why one might want to do that: because JSON texts are...
    *text*, i.e., editable in your favorite $EDITOR, readable with your
    favorite $PAGER, and so on.  It might be a problem if such texts
    leaked outside that locale, but we already have that problem in
    spades, and no JSON parser would be called upon to try to
    auto-detect any encodings other than UTF-8/16/32.

Received on Wednesday, 27 November 2013 00:11:55 UTC