A question on Test 29 of Turtle Tests from Andrew Newman on 2010-10-03 (semantic-web@w3.org from October 2010)

From: Andrew Newman <andrewfnewman@gmail.com>
Date: Mon, 4 Oct 2010 07:46:57 +1000
To: SWIG <semantic-web@w3.org>
Message-ID: <AANLkTi=X8XuxhUP9x8QdF8yPz00O8J4Sa6rxT0ZHvxWp@mail.gmail.com>

Hi,

I'm trying to understand how the following file is a valid NTriple
file (which is found in the Turtle compliance tests):
http://www.w3.org/TeamSubmission/turtle/tests/test-29.out

The question is about the object in the triple starting with:
"<scheme:\u0001" which, following NTriple escaping, a URI Reference
"scheme:" followed by unicode character 0001.  It seems like this is
an invalid URI.

It confused me because the manifest actually says "Escaping U+0001 to
U+007F in a URI".  It would seem that this example would be simpler to
a be literal rather than a URI.

The way to parse these files seems to be to perform NTriple escaping
and then parse the string as a (absolute) URI.  That's how I get an
invalid URI so I must be doing something simple wrong.

Maybe a solution is to go straight from NTriple escaping to URI
escaping (\u0001 -> %01)?  Except of course, those that are "ALPHA
(%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
underscore (%5F), or tilde (%7E)" (from the RFC).

-Andrew

Received on Sunday, 3 October 2010 22:39:44 UTC