Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft) from Henri Sivonen on 2013-11-26 (www-tag@w3.org from November 2013)

From: Henri Sivonen <hsivonen@hsivonen.fi>
Date: Tue, 26 Nov 2013 16:10:14 +0200
To: Pete Cordell <petejson@codalogic.com>
Cc: John Cowan <cowan@mercury.ccil.org>, "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>, www-tag <www-tag@w3.org>, Paul Hoffman <paul.hoffman@vpnc.org>, JSON WG <json@ietf.org>, es-discuss <es-discuss@mozilla.org>
Message-ID: <CANXqsRJi8dv0Giw7CZWP=10qEJEXGyRTb0HFnE9MpeAxc2_0rA@mail.gmail.com>

On Fri, Nov 22, 2013 at 1:33 PM, Pete Cordell <petejson@codalogic.com> wrote:
> Personally I think we have to be careful not to fall into the trap of
> assuming that the only use-case for JSON will be in "to browser"
> communications.

I don't expect it to be the only use.

> I'm hoping that for the IETFs purposes we'll be looking at
> JSONs wider utility into broader areas, which may even include logging to
> files and interprocess communication where there might be sensible reasons
> to choose something other than UTF-8.

What sensible reasons could there possibly be?

The one reason for using UTF-16 is contrived. (Your JSON consists
almost entirely of East Asian string literals with next to no JSON
syntax itself, you are bandwidth-constrained and, magically,
simultaneously so CPU-constrained that you can't use gzip.) For UTF-32
not even contrived reasons exist.

(If you use shared-memory IPC between processes that use non-UTF-8 for
representing Unicode strings, you shouldn't treat the exchange as
happening via char* plus the encoding layer and the JSON MIME type but
using char16_t* or char32_t* without the encoding layer involved. For
example, if you JSONify data to communicate from Web Workers to the
main thread, conceptually, the JSONification happens to Unicode
strings--not to bytes, so the JSON RFC doesn't get involved.)

On Fri, Nov 22, 2013 at 6:39 PM, John Cowan <cowan@mercury.ccil.org> wrote:
> Henri Sivonen scripsit:
>
>> Even if no one or approximately no one (outside test cases) actually
>> emits JSON in UTF-32?
>
> How on earth would you know that?

There exists no situation where using UTF-32 for interchange makes
sense. I think proponents of craziness of the level of using UTF-32
for interchange should show evidence of existing crazy deployments
instead of asking future implementers to support UTF-32 just because
it wasn't possible to prove non-existence.

On Fri, Nov 22, 2013 at 9:28 PM, Pete Cordell <petejson@codalogic.com> wrote:
>       00 00 -- --  UTF-32BE
>       00 xx -- --  UTF-16BE
>       xx 00 00 00  UTF-32LE
>       xx 00 00 xx  UTF-16LE
>       xx 00 xx --  UTF-16LE
>       xx xx -- --  UTF-8

I continue to strongly disapprove of non-BOM-based sniffing rules
unless there's compelling evidence that such rules are needed in order
to interoperate with bogus existing serializers.

-- 
Henri Sivonen
hsivonen@hsivonen.fi
http://hsivonen.fi/

Received on Tuesday, 26 November 2013 14:10:48 UTC