W3C home > Mailing lists > Public > www-tag@w3.org > November 2013

Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

From: Nico Williams <nico@cryptonector.com>
Date: Tue, 26 Nov 2013 16:00:41 -0600
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: JSON WG <json@ietf.org>, www-tag <www-tag@w3.org>, es-discuss <es-discuss@mozilla.org>
Message-ID: <20131126220036.GG21240@localhost>
On Tue, Nov 26, 2013 at 09:15:38PM +0100, Bjoern Hoehrmann wrote:
> * Nico Williams wrote:
> >We must not require encoding detection functionality in parsers.  We
> >must not forbid it either.  We might need to say that encodings other
> >than UTF-8/16/32 may not be reliably detected, therefore they are highly
> >discouraged, even forbidden except where protocols specifically call for
> >them.
> 
> When I pass a fully conforming UTF-8 encoded application/json entity to
> a fully conforming JSON parser I do not want the parser to do something
> funny like interpreting the document as if it were Windows-1252 encoded.
> I am amazed how many people here think a parser that does that should
> not be considered broken.

You missed the point.  I'm outlining what we can and should do.  We
should strongly encourage UTF-8 (require it, even, for parsers).  We
should not forbid other encodings -- at least not UTF-16 nor UTF-32 --
though we might agree to say nothing about them.

As to non-UTF encodings, well, think of something like the Korn Shell,
with it's... very strange "compound variables", and consider something
more like the Windows Power Shell.

It might be awesome to have a Unix shell that uses JSON as a [far]
superior alternative to the Korn Shell's compound variable disaster.
But you see, if you have any non-Unicode locales, how would such a shell
encode its JSON values?  Obviously: not in any UTF (except, maybe,
UTF-7).  It'd not be hard for such a shell to handle non-Unicode locales
just fine.  Not that such a shell's JSON parser should auto-detect
encodings (no way), but you know well enough that there's text documents
lying around in all sorts of encodings without the encoding metadata
being recorded anywhere.

If you wanted to forbid non-Unicode, non-UTF encodings, then you'd be
preventing such a shell, and for what reason?  If you only mean that
auto-detection of encoding should not even be mentioned, I'm fine with
that, and I've already said so earlier.

(Of course I'd love to see non-Unicode locales disappear, but I don't
think that's in the cards.  And yes, I had a Unix Power-like shell in
mind when I wrote the text you quoted.)

Nico
-- 
Received on Tuesday, 26 November 2013 22:01:06 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:57:00 UTC