- From: Pete Cordell <petejson@codalogic.com>
- Date: Mon, 18 Nov 2013 16:08:53 -0000
- To: "Tim Bray" <tbray@textuality.com>
- Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, "John Cowan" <cowan@mercury.ccil.org>, "IETF Discussion" <ietf@ietf.org>, "JSON WG" <json@ietf.org>, "Anne van Kesteren" <annevk@annevk.nl>, <www-tag@w3.org>, "es-discuss" <es-discuss@mozilla.org>
----- Original Message From: "Tim Bray" <tbray@textuality.com> > This feels backward, because BOMs are actually useful for UTF-16 and > UTF-32, but essentially useless for UTF-8. Not useless if you're trying to tell the difference between a hand editted Windows cp-1252 (or whatever it's called) encoded text file and a UTF-8 encoded text file. I don't think we need them for any other reason, but I think some international Windows users would be thankful if you allowed them for that case. On Mon, Nov 18, 2013 at 2:05 AM, Pete Cordell <petejson@codalogic.com>wrote: > Given the history below, would it be sensible to accept BOMs for UTF-8 > encoding, but not for UTF-16 and UTF-32? In other words, are BOMs needed > and/or used in the wild for UTF-16 and UTF-32? > > Maybe the text can say something like "SHOULD accept BOMs for UTF-8, and > MAY accept BOMs for UTF-16 and / or UTF-32"? > > Thanks, > > Pete Cordell > Codalogic Ltd > C++ tools for C++ programmers, http://codalogic.com > Read & write XML in C++, http://www.xml2cpp.com > ----- Original Message ----- From: ""Martin J. Dürst"" < > duerst@it.aoyama.ac.jp> > To: "Henry S. Thompson" <ht@inf.ed.ac.uk> > Cc: "John Cowan" <cowan@mercury.ccil.org>; "IETF Discussion" > <ietf@ietf.org>; "Paul Hoffman" <paul.hoffman@vpnc.org>; "JSON WG" > <json@ietf.org>; "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>; "Anne > van > Kesteren" <annevk@annevk.nl>; <www-tag@w3.org>; "es-discuss" > <es-discuss@mozilla.org> > Sent: Thursday, November 14, 2013 11:14 AM > Subject: Re: [Json] JSON: remove gap between Ecma-404 and IETF draft > > > Hello Henry, others, >> >> On 2013/11/14 18:44, Henry S. Thompson wrote: >> >>> John Cowan writes: >>> >>> Joe Hildebrand (jhildebr) scripsit: >>>> >>>> If 404 doesn't allow [a BOM], I don't see a strong need to add it. >>>>> Parsers can always be more forgiving of what they will parse than what >>>>> the spec says, particularly since section 9 says "A JSON parser MAY >>>>> accept non-JSON forms or extensions". >>>>> >>>> >>>> It's not clear that 404 disallows it, since 404 is defined in terms of >>>> characters, and a BOM is not a character but an out-of-band signal. >>>> >>> >>> I think this is a crucial observation. >>> >> >> Yes, and I think it's based on the experience with XML. But while this >> experience may be applicable to JSON, Anne's original comment about the >> BOM and XMLHttpRequest suggests that 404 actually currently does not >> tolerate a BOM, and that implementations (except for XMLHttpRequest) also >> don't. >> >> To give some historic background, the BOM for UTF-8 wasn't in the first >> edition of XML (http://www.w3.org/TR/1998/REC-xml-19980210#sec-guessing). >> It only later came in because Microsoft used it for notepad to be able to >> quickly distinguish between UTF-8 and the legacy system encoding. Because >> many people were writing some XML by hand, and some of them were using >> notepad, the pressure on XML to accept a BOM at the start of an UTF-8 >> file >> mounted, and it was included in the second edition of the XML >> Recommendation (http://www.w3.org/TR/2000/REC-xml-20001006#sec-guessing). >> >> Compared to XML, JSON may be much less edited by hand, or much less >> edited >> on notepad, or otherwise just have a different history from XML, but we >> have to make sure. >> >> Regards, Martin. >> >> >> I note that XML approaches >>> this problem in what might be a useful way. The XML ABNF makes no >>> mention of BOM, it's not part of any XML document as such. But it >>> _is_ allowed. The relevant wording [1] is: >>> >>> Entities ... may begin with the Byte Order Mark described by Annex H >>> of [ISO/IEC 10646:2000], section 16.8 of [Unicode] (the ZERO WIDTH >>> NO-BREAK SPACE character, #xFEFF). _This is an encoding signature,_ >>> _not part of either the markup or the character data of the XML_ >>> _document._ XML processors must be able to use this character to >>> differentiate between UTF-8 and UTF-16 encoded documents. [emphasis >>> added] >>> >>> ht >>> >>> [1] http://www.w3.org/TR/REC-xml/#charencoding >>> >> _______________________________________________ >> json mailing list >> json@ietf.org >> https://www.ietf.org/mailman/listinfo/json >> > > _______________________________________________ > json mailing list > json@ietf.org > https://www.ietf.org/mailman/listinfo/json >
Received on Monday, 18 November 2013 16:08:46 UTC