W3C home > Mailing lists > Public > www-international@w3.org > January to March 2013

Re: claimed completion on "ACTION-233: Publish the consolidated test suite"

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 20 Mar 2013 15:13:23 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>, I18N folks <www-international@w3.org>
Cc: public-rdf-wg@w3.org
Message-ID: <20130320191322.GD12591@w3.org>
* Andy Seaborne <andy.seaborne@epimorphics.com> [2013-03-20 17:36+0000]
> The TTL has U+037E but ...
> 
> PN_CHARS_BASE has a hole specifically for that
> 
> [#x0370-#x037D] | [#x037F-#x1FFF]
> 
> => not a legal char.

Yeah, I screwed that up. I should have gone the other way 'cause it's at the bottom of a range (unlike all the other unassigned chars). Attached are the same tests with s/37f/384/. Could you chop off after the "AZaz" and see if that works and do a binary search to see what it's complaining about?

I18N folks, could you tell me why an NFC validator is objecting to this (beautiful) IRI and if there's some validator I can use for testing:?
  <http://a.example/AZazÀÖØöø˿Ͱͽ΄῾‌‍⁰↉Ⰰ⿕、ퟻ豈ﷇﷰ�𐀀>
The goal is to test as much as possible the valid input to <http://www.w3.org/TR/turtle/#grammar-production-PrefixedName>. In turtle, the localName gets appended to the namespace, hence the url above. The

  [163s] PN_CHARS_BASE ::=    [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

production is taken from <http://www.w3.org/TR/REC-xml/#NT-NameStartChar>:

  [4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]



> Removing it (Greek question mark), I then get:
> 
> WARN  [line: 2, col: 43] Bad IRI:
> <http://a.example/AZaz???????????????????????> Code: 46/NOT_NFC in
> PATH: The IRI is not in Unicode Normal Form C.
> WARN  [line: 2, col: 43] Bad IRI:
> <http://a.example/AZaz???????????????????????> Code: 47/NOT_NFKC in
> PATH: The IRI is not in Unicode Normal Form KC.
> WARN  [line: 2, col: 43] Bad IRI:
> <http://a.example/AZaz???????????????????????> Code:
> 56/COMPATIBILITY_CHARACTER in PATH: TODO
> 
> with or without the last char.
> 
> >I poked around looking for composing characters in the PN_CHARS_BASE
> >character ranges. \u02ff MODIFIER LETTER LOW LEFT ARROW seemed like it
> >could be a culprit, but fileformat.info claims it's not in a combining
> >class. Likewise \ufffd REPLACEMENT CHARACTER
> >
> >There are a bunch of yet-unassigned characters which could be confusing
> >a vigilent IRI checkr. I've mapped those to the highest currently-
> >assigned characters in their respective range (per fileformat.info):
> >
> >     \u037f   37e
> >     \u1fff  1ffe
> >     \u218f  2189
> >     \u2fef  2fd5
> >     \ud7ff  d7fb
> >     \ufdcf  fdc7
> >\U000effff e01ef
> >
> >attached is a variant of
> >   localName_with_PN_CHARS_BASE_character_boundaries.{nt,ttl}
> >with the values substituted. (I pass this modified test so there
> >shouldn't be any typos in it.) If it still doesn't work, try chopping
> >off the last character 'cause it's a variation selector which ostensibly
> >is NF{,K}{C,D} valid, but may not have been when jjc wrote your checker.
> >
> >
> 

-- 
-ericP
Received on Wednesday, 20 March 2013 19:13:55 GMT

This archive was generated by hypermail 2.3.1 : Wednesday, 20 March 2013 19:13:58 GMT