- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Tue, 19 Nov 2013 20:09:30 +0900
- To: "t.p." <daedulus@btconnect.com>
- CC: John Cowan <cowan@mercury.ccil.org>, IETF Discussion <ietf@ietf.org>, Pete Cordell <petejson@codalogic.com>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, www-tag@w3.org, es-discuss <es-discuss@mozilla.org>
On 2013/11/19 19:10, t.p. wrote: > ----- Original Message ----- > From: "Martin J. Dürst"<duerst@it.aoyama.ac.jp> >> For UTF-8, the BOM is not a Byte Order Mark, because such a mark isn't >> necessary at all. It may serve as a signature, but is not necessary, > and >> in some circumstances counterproductive. > > Martin > > We had a similar discussion with syslog back in 2005, the issue being > that UTF-8 was new and different and how to tell whether it was being > used or not, and what made it into RFC5424 was > " If a syslog application encodes MSG in UTF-8, the string MUST start > with the Unicode byte order mask (BOM), which for UTF-8 is ABNF > %xEF.BB.BF. " > which remains a MUST to this day. There are no relevant Errata. > > Tom Petch This is something that seems to have made quite a lot of sense for syslog. I can understand that if before 2005, syslog was used with legacy encodings (iso-8859-1, Shift_JIS and similar), and there was otherwise no easy way to label the UTF-8 strings. But another solution (for syslog, that is) would also have been possible. As John already pointed out, UTF-8 is very easy to detect heuristically: If a byte sequence follows the UTF-8 byte pattern, it's most definitely UTF-8 and not something else. For more background, please see http://www.sw.it.aoyama.ac.jp/2012/pub/IUC11-UTF-8.pdf, where that idea came up first. As for JSON, it doesn't have the problem of legacy encodings. JSON by definition is encoded in an Unicode encoding form, and it's easy to distinguish these because of the restrictions on character sequences in JSON. And this can be done without a BOM (or with a BOM). What's most important now is to know what receivers actually accept. We are not in a design phase, we are just updating the definition of JSON and making sure we fix problems if there are problems, but we have to use the installed base for the main guidance, not other protocols or formats. Regards, Martin.
Received on Tuesday, 19 November 2013 11:10:54 UTC