- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Thu, 14 Nov 2013 20:14:29 +0900
- To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
- CC: John Cowan <cowan@mercury.ccil.org>, "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>, Paul Hoffman <paul.hoffman@vpnc.org>, Anne van Kesteren <annevk@annevk.nl>, es-discuss <es-discuss@mozilla.org>, IETF Discussion <ietf@ietf.org>, "www-tag@w3.org" <www-tag@w3.org>, JSON WG <json@ietf.org>
Hello Henry, others, On 2013/11/14 18:44, Henry S. Thompson wrote: > John Cowan writes: > >> Joe Hildebrand (jhildebr) scripsit: >> >>> If 404 doesn't allow [a BOM], I don't see a strong need to add it. >>> Parsers can always be more forgiving of what they will parse than what >>> the spec says, particularly since section 9 says "A JSON parser MAY >>> accept non-JSON forms or extensions". >> >> It's not clear that 404 disallows it, since 404 is defined in terms of >> characters, and a BOM is not a character but an out-of-band signal. > > I think this is a crucial observation. Yes, and I think it's based on the experience with XML. But while this experience may be applicable to JSON, Anne's original comment about the BOM and XMLHttpRequest suggests that 404 actually currently does not tolerate a BOM, and that implementations (except for XMLHttpRequest) also don't. To give some historic background, the BOM for UTF-8 wasn't in the first edition of XML (http://www.w3.org/TR/1998/REC-xml-19980210#sec-guessing). It only later came in because Microsoft used it for notepad to be able to quickly distinguish between UTF-8 and the legacy system encoding. Because many people were writing some XML by hand, and some of them were using notepad, the pressure on XML to accept a BOM at the start of an UTF-8 file mounted, and it was included in the second edition of the XML Recommendation (http://www.w3.org/TR/2000/REC-xml-20001006#sec-guessing). Compared to XML, JSON may be much less edited by hand, or much less edited on notepad, or otherwise just have a different history from XML, but we have to make sure. Regards, Martin. > I note that XML approaches > this problem in what might be a useful way. The XML ABNF makes no > mention of BOM, it's not part of any XML document as such. But it > _is_ allowed. The relevant wording [1] is: > > Entities ... may begin with the Byte Order Mark described by Annex H > of [ISO/IEC 10646:2000], section 16.8 of [Unicode] (the ZERO WIDTH > NO-BREAK SPACE character, #xFEFF). _This is an encoding signature,_ > _not part of either the markup or the character data of the XML_ > _document._ XML processors must be able to use this character to > differentiate between UTF-8 and UTF-16 encoded documents. [emphasis > added] > > ht > > [1] http://www.w3.org/TR/REC-xml/#charencoding
Received on Thursday, 14 November 2013 11:16:19 UTC