Re: BOMs from Phillip Hallam-Baker on 2013-11-18 (www-tag@w3.org from November 2013)

From: Phillip Hallam-Baker <hallam@gmail.com>
Date: Mon, 18 Nov 2013 15:56:11 -0500
To: Pete Cordell <petejson@codalogic.com>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, John Cowan <cowan@mercury.ccil.org>, IETF Discussion <ietf@ietf.org>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, "www-tag@w3.org" <www-tag@w3.org>, es-discuss <es-discuss@mozilla.org>
Message-ID: <CAMm+LwiHVc0mDrUr8yCMKt9wChV1tvybTtxSQej7eDSVq3SOnA@mail.gmail.com>

On Mon, Nov 18, 2013 at 8:36 AM, Pete Cordell <petejson@codalogic.com>wrote:

> ----- Original Message ----- From: ""Martin J. Dürst"" <
> duerst@it.aoyama.ac.jp>
>
>  On 2013/11/18 20:11, Henry S. Thompson wrote:
>>
>>> Pete Cordell writes:
>>>
>>>  Given the history below, would it be sensible to accept BOMs for UTF-8
>>>> encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs
>>>> needed
>>>> and/or used in the wild for UTF-16 and UTF-32?
>>>>
>>>> Maybe the text can say something like "SHOULD accept BOMs for UTF-8,
>>>> and MAY accept BOMs for UTF-16 and / or UTF-32"?
>>>>
>>>
>>> My sense is that you'll see more UTF-16 BOMs than anything else.
>>>
>>
>> Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire
>> UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we are
>> discussing.)
>>
>
> The in-memory case is not entirely irrelevant because a number of JSON
> messages will be constructed in memory and then squirted to line.
>
> I did a little experiment with Visual Studio.  It will allow me to save in
> UTF-8 with or without a BOM (like thing).  Saving in UTF-16 (Or was it
> UCS2?) is always with a BOM.  There didn't seem to be a UTF-32 option.
>
> JSON doesn't need BOMs.  However, there are cases where people might hand
> edit messages, and if they choose to save in UTF-16 they will likely have a
> BOM.
>
> Is it acceptable to tell people not to save hand editted files in UTF-16,
> suggesting UTF-8 (possibly with an encoded BOM) as an alternative?
>
> I would imagine that if someone did have a hand editted UTF-8 file on
> Windows then the allowance of a BOM would help their sanity immeasurably,
> but it's not something I have firsthand knowledge of.
>

I believe the opposite is true.

The failure of Windows to correctly process documents without BOM markers
is a constant pain trying to use .NET to parse XML.

The ability to compose a JSON message by wrapping another JSON message is
essential. That is, it has to be possible to write something like

printf ("{\"Object\", %s}", Text);

I use the .NET platform heavily. Please do not let Microsoft off the hook
here. The cost of doing so is having to write code to kick out spurious BOM
sequences occurring at any random point in the text. Which becomes really
painful when having to deal with strings where there might actually be a
reason to put the BOM in.

The benefit of not doing so is that it might encourage Microsoft to fix
their tools so that they don't insert spurious BOM sequences in documents
where doing so breaks them.

-- 
Website: http://hallambaker.com/

Received on Monday, 18 November 2013 20:56:41 UTC