Re: RDF last call comment: escaping of URI references in N-triples

On Thu, 08 May 2003 15:28:15 -0400
Martin Duerst <duerst@w3.org> wrote:

> 
> Dear RDF specialists,
> 
> [This is currently a personal comment. I'll ask the I18N WG to
> look at it on their teleconf next week.]
> 
> Emmanuel and me just discovered a problem in the RDF spec,
> in the definition of N-triples at
> http://www.w3.org/TR/rdf-testcases/#sec-uri-encoding

The 23 January 2003 last call WD, 
  http://www.w3.org/TR/2003/WD-rdf-testcases-20030123/#sec-uri-encoding

> 
> This says:
> 
>  >>>>
> Disallowed characters are represented in UTF-8 and then encoded using the 
> %HH format, where HH is the byte value expressed using hexadecimal notation.
> 
> Characters above the US-ASCII range are made available by the \u or \U 
> escapes as described in section Strings for ranges [#x80-#xFFFF] and 
> [#x10000-#x10FFFF] respectively.
>  >>>>
>
> So if I have <http://example.org/ÎëÌÚ>, ...

an IRI

> ... what's the correct representation
> of this in N-triples? Is it <http://example.org/\u9234\u6728> ? Or is
> it <http://example.org/%E9%88%B4%E6%9C%A8> ? The spec currently seems
> to allow both, but this clearly is going against the purpose of N-triples
> for testing purposes.

RDF uses it's own definition of RDF URI References, which should have
been linked from here rather than to the IRI definition in the
ongoing Charmod draft work.  It might be easier to say less so that
whatever characters your RDF URI reference contains, this document
just tells you how to encode it.

I think that removing the first paragraph in the quote above should
be sufficient, along with adding a reference to the RDF Concepts WD
definitions.  Would that help?

> In line with the IRI spec, which says that conversion to URIs should
> be done as late as possible, I strongly suggest to only use the
> http://example.org/\u9234\u6728 form. This should also allow to
> streamline the description of escaping, which should become the
> same for Strings and for URIs. There should also be a statement
> saying that the escaping is needed for N-triples, but not for N3, ...

I won't be discussing N3 in this document, it doesn't define that
changing research language.  In this regard, for example, I recall
that N3 changed from ASCII to UTF-8 since N-Triples was designed.

>  ...and there should be some I18N component in the example to show
> the points.  ...

This can be fixed by adding to the test file
  http://www.w3.org/2000/10/rdf-tests/rdfcore/ntriples/test.nt
some encoded RDF URI examples such as <http://example.org/\u9234\u6728>
I will do this if that will address this sufficiently.

> ... Using the \u form is also robust to potential
> changes in the IRI spec.  Some care is needed about characters in
> the ASCII range that are not allowed in URIs.

I really don't want to give the detail of URIs or IRIs - people can
look up those specs if they want to know that, it shouldn't be
duplicated here.

> The RDF validator currently does a third thing, namely it does
> not use any escaping at all:
> 
>  >>>>>>>>
> The original RDF/XML document
> 
> 1: <?xml version="1.0"?>
> 2: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> 3:   xmlns:dc="http://purl.org/dc/elements/1.1/">
> 4:   <rdf:Description rdf:about="http://www.w3.org/ÎëÌÚ">
> 5:     <dc:title>ÎëÌÚÂÀϺ</dc:title>
> 6:   </rdf:Description>
> 7: </rdf:RDF>
> 8:
> 
> Triples of the Data Model in N-Triples Format (Sub, Pred, Obj)
> 
> <http://www.w3.org/ÎëÌÚ> <http://purl.org/dc/elements/1.1/title> 
> "\u9234\u6728\u592A\u90CE" .
>  >>>>>>>>

That's illegal N-Triples (7 bit US-ASCII); no character above 126 is
allowed.   Which reminds me, I made one for N-Triples:

  Redland N-Triples Validator
  http://www.redland.opensource.ac.uk/ntriples/

Dave

Received on Tuesday, 13 May 2003 05:24:38 UTC