Re: [Json] JSON: remove gap between Ecma-404 and IETF draft from Henry S. Thompson on 2013-11-14 (www-tag@w3.org from November 2013)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Thu, 14 Nov 2013 09:44:00 +0000
To: John Cowan <cowan@mercury.ccil.org>
Cc: "Joe Hildebrand \(jhildebr\)" <jhildebr@cisco.com>, Paul Hoffman <paul.hoffman@vpnc.org>, Anne van Kesteren <annevk@annevk.nl>, es-discuss <es-discuss@mozilla.org>, IETF Discussion <ietf@ietf.org>, "www-tag\@w3.org" <www-tag@w3.org>, JSON WG <json@ietf.org>
Message-ID: <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk>

John Cowan writes:

> Joe Hildebrand (jhildebr) scripsit:
>
>> If 404 doesn't allow [a BOM], I don't see a strong need to add it.
>> Parsers can always be more forgiving of what they will parse than what
>> the spec says, particularly since section 9 says "A JSON parser MAY
>> accept non-JSON forms or extensions".
>
> It's not clear that 404 disallows it, since 404 is defined in terms of
> characters, and a BOM is not a character but an out-of-band signal.

I think this is a crucial observation.  I note that XML approaches
this problem in what might be a useful way.  The XML ABNF makes no
mention of BOM, it's not part of any XML document as such.  But it
_is_ allowed.  The relevant wording [1] is:

  Entities ... may begin with the Byte Order Mark described by Annex H
  of [ISO/IEC 10646:2000], section 16.8 of [Unicode] (the ZERO WIDTH
  NO-BREAK SPACE character, #xFEFF). _This is an encoding signature,_
  _not part of either the markup or the character data of the XML_
  _document._ XML processors must be able to use this character to
  differentiate between UTF-8 and UTF-16 encoded documents. [emphasis
  added]

ht

[1] http://www.w3.org/TR/REC-xml/#charencoding
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]

Received on Thursday, 14 November 2013 09:45:23 UTC