- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 17 May 2013 07:37:30 -0400
- To: Alex Milowski <alex@milowski.com>
- Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
- Message-ID: <20130517113728.GC13487@w3.org>
* Alex Milowski <alex@milowski.com> [2013-05-16 23:44-0700] > In looking at test: > > prefix_with_PN_CHARS_BASE_character_boundaries.ttl [1] > > There are the code points u+dc00, u+db7f, and u+dfff in the last part of > the prefix. The code points u+d800-u+dfff are not valid unicode characters. > > Why are these in a positive test? > > [1] > https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/tests-ttl/prefix_with_PN_CHARS_BASE_character_boundaries.ttl In the prefix, I see these codepoints: u+41 u+5a u+61 u+7a u+c0 u+d6 u+d8 u+f6 u+f8 u+2ff u+370 u+37d u+37f u+1fff u+200c u+200d u+2070 u+218f u+2c00 u+2fef u+3001 u+d7ff u+f900 u+fdcf u+fdf0 u+fffd u+10000 u+effff . I've attached two variants the test, one encoded in UTF-8, which is legal turtle and has only the codepoints above, and UTF-16, which uses surrogate to encode the codepoints u+10000 and u+effff . Is it possible that your buffer got re-encoded as UTF-16 at some point? If this message resolves your comment, please reply with "[RESOLVED]" in the subject. > -- > --Alex Milowski > "The excellence of grammar as a guide is proportional to the paucity of the > inflexions, i.e. to the degree of analysis effected by the language > considered." > > Bertrand Russell in a footnote of Principles of Mathematics -- -ericP
Attachments
- text/turtle attachment: prefix_with_PN_CHARS_BASE_character_boundaries.ttl
- application/octet-stream attachment: prefix_with_PN_CHARS_BASE_character_boundaries.utf-16
Received on Friday, 17 May 2013 11:37:59 UTC