- From: Gregg Kellogg <gregg@greggkellogg.net>
- Date: Sat, 23 Mar 2013 15:35:26 -0700
- To: "public-rdf-comments@w3.org Comments" <public-rdf-comments@w3.org>
I've been struggling with the localName_with_PN_CHARS_BASE_character_boundaries.ttl test, which tests the range of characters allowed by PN_CHARS_BASE. Within the grammar, this is defined as follows: [163s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] |[#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] |[#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] This explicitly includes characters beyond what is allows in RFC-3987 [2] uschar production: ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD As a result, even though my Turtle processor parses the test, it fails when I try to validate the output, where I ensure that IRIs are also valid. My read of the ucschar production is that a valid IRI does not include %xEFFFE or %xEFFFF, which _are_ included in Turtle (and SPARQL I believe). (Interestingly, it also excludes some ranges that are included in ucschar, but that is the subject of issue-190 [3]). Since the horse has probably left the barn, I don't expect PN_CHARS_BASE to change at this point, but tests, such as localName_with_PN_CHARS_BASE_character_boundaries.ttl should probably be limited to be valid IRIs according to RFC-3987, as that spec is normatively referenced. Gregg Kellogg gregg@greggkellogg.net [1] http://www.w3.org/TR/turtle/#sec-grammar-grammar [2] http://www.ietf.org/rfc/rfc3987.txt [3] http://www.w3.org/International/track/issues/190
Received on Saturday, 23 March 2013 22:35:58 UTC