- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Tue, 19 Nov 2013 13:32:37 +0900
- To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
- CC: Bjoern Hoehrmann <derhoermi@gmx.net>, IETF Discussion <ietf@ietf.org>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, www-tag@w3.org, es-discuss <es-discuss@mozilla.org>
Okay, here are some more tests. http://www.sw.it.aoyama.ac.jp/2013/pub/json_tests/test1_utf8_nobom.json http://www.sw.it.aoyama.ac.jp/2013/pub/json_tests/test2_utf8_bom.json They are self-describing JSON files served with application/json, the first without a BOM, and the second with a BOM. They contain some Japanese, and a tiny bit of Spanish. [see more below] On 2013/11/18 21:59, Henry S. Thompson wrote: > Bjoern Hoehrmann writes: > >> Perl's JSON module gives me >> >> malformed JSON string, neither array, object, number, string >> or atom, at character offset 0 (before "\x{ef}\x{bb}\x{bf}[]") >> >> Python's json module gives me >> >> ValueError: No JSON object could be decoded >> >> Go's "encoding/json" module gives me >> >> invalid character 'ï' looking for beginning of value > > I'm curious to know what level you're invoking the parser at. As > implied by my previous post about the Python 'requests' package, it > handles application/json resources by stripping any initial BOM it > finds -- you can try this with > >>>> import requests >>>> r=requests.get("http://www.ltg.ed.ac.uk/ov-test/b16le.json") >>>> r.json() I get a 404 on this example. I can put up UTF-16 examples, too. Regards, Martin. > Signatures are not part of the text of a document, as the UNICODE spec > makes clear, so asking what happens when you pass a string beginning > with a BOM to a parser is not really the right question in this > context, is it? > > As I tried to say in an earlier post, there's a distinction which > needs to be carefully insisted on between, on the one hand, languages > and their parsers, where I agree signatures/BOMs have no place, and, > on the other hand, (media-typed) resources/entities/payloads and _their_ > processing, where a discussion of BOMs/signatures _is_ appropriate > and, often, necessary. > > BTW I agree that the status of the UTF-8 BOM as signature is slightly > hazy, but again the UNICODE spec itself [1] says > > "this sequence can serve as signature for UTF-8 encoded text where > the character set is unmarked" > > ht > > [1] http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf
Received on Tuesday, 19 November 2013 04:33:55 UTC