W3C home > Mailing lists > Public > public-rdf-wg@w3.org > March 2013

Re: claimed completion on "ACTION-233: Publish the consolidated test suite"

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 20 Mar 2013 09:59:30 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <20130320135924.GC9440@w3.org>
* Andy Seaborne <andy.seaborne@epimorphics.com> [2013-03-20 10:31+0000]
> 
> 
> On 20/03/13 07:12, Eric Prud'hommeaux wrote:
> >I pushed the ~100 atomic tests into the test-ttl test suite.
> >
> >huh, i thought the action was on me until i checked tracker.
> 
> Eric,
> 
> I'm getting some warnings from use of:
> 
> <http://a.example/AZaz\u00c0\u00d6\u00d8\u00f6\u00f8\u02ff\u0370\u037d\u037f\u1fff\u200c\u200d\u2070\u218f\u2c00\u2fef\u3001\ud7ff\uf900\ufdcf\ufdf0\ufffd\U00010000\U000effff>
> 
> as not being Normal Form KC and not Normal Form C (presumably
> different characters causing those two warnings).
> 
> (it's not related to the \U characters - I tried without them as well.)
> 
> I'm not clear what RDF Concept says here.  It's directly stating
> literals are NFC, but any impact on IRIs comes indirectly from "it's
> a legal IRI"
> 
> RFC 3987:
> [[ 3.1.  Mapping of IRIs to URIs
> 
>             c. If the IRI is in a Unicode-based character encoding (for
>                example, UTF-8 or UTF-16), do not normalize (see section
>                5.3.2.2 for details).  Apply step 2 directly to the
>                encoded Unicode character sequence.
> ]]
> 
> 5.3.2.2 says:
> [[
> To avoid false negatives and problems with
>    transcoding, IRIs SHOULD be created by using NFC.
> ]]
> 
> so it's a SHOULD in RFC 3987 on creation.

I poked around looking for composing characters in the PN_CHARS_BASE
character ranges. \u02ff MODIFIER LETTER LOW LEFT ARROW seemed like it
could be a culprit, but fileformat.info claims it's not in a combining
class. Likewise \ufffd REPLACEMENT CHARACTER

There are a bunch of yet-unassigned characters which could be confusing
a vigilent IRI checkr. I've mapped those to the highest currently-
assigned characters in their respective range (per fileformat.info):

    \u037f   37e
    \u1fff  1ffe
    \u218f  2189
    \u2fef  2fd5
    \ud7ff  d7fb
    \ufdcf  fdc7
\U000effff e01ef

attached is a variant of
  localName_with_PN_CHARS_BASE_character_boundaries.{nt,ttl}
with the values substituted. (I pass this modified test so there
shouldn't be any typos in it.) If it still doesn't work, try chopping
off the last character 'cause it's a variation selector which ostensibly
is NF{,K}{C,D} valid, but may not have been when jjc wrote your checker.


> 	Andy
> 
> 
> >
> >I think we should quickly change the
> >   @prefix rdft:   <http://www.w3.org/ns/rdftest#> .
> >namespace to
> >   <https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/tests-ttl/ns>
> >as it is entirely Turtle-specific (used for the following types:
> >   rdft:TestTurtleEval
> >   rdft:TestTurtlePositiveSyntax
> >   rdft:TestTurtleNegativeSyntax
> >   rdft:TestTurtleNegativeEval
> >
> >). I expect that the general notion of an RDF manifest-driven test
> >suite will some day use <http://www.w3.org/ns/rdftest#> but would look
> >like <http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#> .
> >
> >While curating the CR comments, I saw that 21 proposes some additional tests.
> >
> >http://www.w3.org/2011/rdf-wg/wiki/Turtle_Candidate_Recommendation_Comments#c21
> >
> 

-- 
-ericP


Received on Wednesday, 20 March 2013 14:00:00 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:02:11 UTC