- From: Pete Cordell <petejson@codalogic.com>
- Date: Thu, 14 Nov 2013 12:04:16 -0000
- To: "Paul Hoffman" <paul.hoffman@vpnc.org>, "Joe Hildebrand Hildebrand" <jhildebr@cisco.com>
- Cc: <www-tag@w3.org>, "JSON WG" <json@ietf.org>
Original Message From: "Joe Hildebrand" <hildjj@cursive.net> > On 11/13/13 2:27 PM, "Paul Hoffman" <paul.hoffman@vpnc.org> wrote: > >><no hat> >> >>On Nov 13, 2013, at 12:24 PM, Joe Hildebrand (jhildebr) >><jhildebr@cisco.com> wrote: >> >>> We would also need to change section 8.1 according to the mechanism that >>> was previously proposed: >>> >>> 00 00 00 xx UTF-32BE >>> 00 xx ?? xx UTF-16BE >>> xx 00 00 00 UTF-32LE >>> xx 00 xx ?? UTF-16LE >>> xx xx ?? ?? UTF-8 >>> >>> >>> in order to account for strings at the top level whose first character >>>has >>> a codepoint greater than 127. >> >>A string at the top level of a JSON text still needs to start with an >>ASCII " character, so the logic is still fine, I believe. > > > Without top level strings, the first *two* characters of any JSON text are > always ASCII. This: > > > "?" (that's U+0022 U+0100 U+0022) > > ... > > So the JSON text above would not match any of the table entries, causing > an error. In http://www.ietf.org/mail-archive/web/json/current/msg00565.html I mentioned that we also need to allow for characters such as U+2c00 to be the first character in a quoted string. This requires a pattern like: xx 00 00 xx UTF-16LE giving: 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE 00 xx xx xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 00 xx UTF-16LE xx 00 xx 00 UTF-16LE xx 00 xx xx UTF-16LE xx xx xx xx UTF-8 That can be reduced a bit if we use "--" to indicate "not-tested": 00 00 -- -- UTF-32BE 00 xx -- -- UTF-16BE xx 00 00 00 UTF-32LE xx 00 00 xx UTF-16LE xx 00 xx -- UTF-16LE xx xx -- -- UTF-8 Pete Cordell Codalogic Ltd C++ tools for C++ programmers, http://codalogic.com Read & write XML in C++, http://www.xml2cpp.com
Received on Friday, 15 November 2013 11:52:23 UTC