Re: TurtleTests/localName_with_non_leading_extras URI escape

On 17/11/15 23:29, Rob Stewart wrote:
> Hi,
>
> I have a question about "." being escaped in the
> localName_with_non_leading_extras turtle parser test case. The input
> .ttl file is:
>
> @prefix p: <http://a.example/>.
> p:a·̀ͯ‿.⁀ <http://a.example/p> <http://a.example/o> .
>
> In the expected case in the .nt file, this subject URI is translated to:
>
> <http://a.example/a\u00b7\u0300\u036f\u203f\u002e\u2040>
> <http://a.example/p> <http://a.example/o> .
>
> Why is the "." character escaped to \u002e ?
>
> I would expect the subject URI to be escaped to:
>
> <http://a.example/a\u00b7\u0300\u036f\u203f.\u2040>
>
> The input and expected output test cases are:
>
> http://www.w3.org/2013/TurtleTests/localName_with_non_leading_extras.ttl
> http://www.w3.org/2013/TurtleTests/localName_with_non_leading_extras.nt
>
> This question appears to have been asked before on this list, back in
> December 2013 by David Robillard:
>
> https://lists.w3.org/Archives/Public/public-rdf-comments/2013Dec/0115.html
>
> For this W3C RDF turtle test case, should "." be escaped to \u002e or
> should it not be escaped, as David thought so, and I think I agree.
>
> David's email was:
>
> %%%%%%%%%
> Hello,
>
> Why is the "." escaped as \u002e in
>
> http://www.w3.org/2013/TurtleTests/localName_with_non_leading_extras.nt
>
> My implementation does not escape this character since, even in the old
> NTriples spec,
>
> absoluteURI ::= ( character - ( '<' | '>' | space ) )+
> character   ::= [#x20-#x7E] /* US-ASCII space to decimal 127 */
>
> Which includes ".", #x2E.  Accordingly, my implementation does not
> escape this character.  Should it?
>

My reading is that there is no requirement for it to be escaped, there 
is no requirement to escape any of the characters - N-Triples is defined 
using UTF-8 these days.

See section 4 on the canonical form which says not to use UCHAR.

Or did you mean N-triples as text/plain?  See section 6 where it says 
characters outside ASCII must be escaped for use in text/plain.  Using 
ASCII+UCHAR is not the canonical form.

 Andy

Received on Friday, 20 November 2015 16:23:26 UTC