Re: [Json] BOMs

On Thu, Nov 21, 2013 at 1:37 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> * John Cowan wrote:
>>Bjoern Hoehrmann scripsit:
>>
>>> Is there any chance, by the way, to change `JSON.stringify` so it does
>>> not output strings that cannot be encoded using UTF-8? Specifically,
>>>
>>>   JSON.stringify(JSON.parse("\"\uD800\""))
>>>
>>> would need to escape the surrogate instead of emitting it literally.
>>
>>No, there isn't.  We've been down this road repeatedly.  People can and
>>do use JSON strings to encode arbitrary sequences of unsigned 16-bit integers.
>
> The output of JSON.stringify("\uD800") contains no backslash character,
> if you call `utf8_encode(JSON.stringify("\uD800"))` you get an exception
> because UTF-8 cannot encode the lone surrogate and `utf8_encode` does
> not know it could encode it as `\uD800` without loss of information. If
> `JSON.stringify` produced an escape sequence instead, there would be no
> problem passing the output to `utf8_encode`.

That's just one implementation.  We had hundreds of e-mails in this
list about this.  Well over a thousand to cover several issues like
this.  I think the only area where we have [roughly] consensus to
revisit the previous consensus is the top-level value restriction,
which has led to the whole UTF and byte-order detection sub-thread
(which we had, also, had before).  We're on much stronger ground to
revisit this one matter than the whole unpaired surrogates matter, and
we're much much less likely to change our consensus on that because
one proposal is about relaxing JSON to match ECMAScript's definition,
while yours is to do the opposite.

Nico
--

Received on Thursday, 21 November 2013 19:58:40 UTC