RDF last call comment: escaping of URI references in N-triples

Dear RDF specialists,

[This is currently a personal comment. I'll ask the I18N WG to
look at it on their teleconf next week.]

Emmanuel and me just discovered a problem in the RDF spec,
in the definition of N-triples at
http://www.w3.org/TR/rdf-testcases/#sec-uri-encoding

This says:

 >>>>
Disallowed characters are represented in UTF-8 and then encoded using the 
%HH format, where HH is the byte value expressed using hexadecimal notation.

Characters above the US-ASCII range are made available by the \u or \U 
escapes as described in section Strings for ranges [#x80-#xFFFF] and 
[#x10000-#x10FFFF] respectively.
 >>>>

So if I have <http://example.org/鈴木>, what's the correct representation
of this in N-triples? Is it <http://example.org/\u9234\u6728> ? Or is
it <http://example.org/%E9%88%B4%E6%9C%A8> ? The spec currently seems
to allow both, but this clearly is going against the purpose of N-triples
for testing purposes.

In line with the IRI spec, which says that conversion to URIs should
be done as late as possible, I strongly suggest to only use the
http://example.org/\u9234\u6728 form. This should also allow to
streamline the description of escaping, which should become the
same for Strings and for URIs. There should also be a statement
saying that the escaping is needed for N-triples, but not for N3,
and there should be some I18N component in the example to show
the points. Using the \u form is also robust to potential
changes in the IRI spec. Some care is needed about characters in
the ASCII range that are not allowed in URIs.


The RDF validator currently does a third thing, namely it does
not use any escaping at all:

 >>>>>>>>
The original RDF/XML document

1: <?xml version="1.0"?>
2: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3:   xmlns:dc="http://purl.org/dc/elements/1.1/">
4:   <rdf:Description rdf:about="http://www.w3.org/鈴木">
5:     <dc:title>鈴木太郎</dc:title>
6:   </rdf:Description>
7: </rdf:RDF>
8:

Triples of the Data Model in N-Triples Format (Sub, Pred, Obj)

<http://www.w3.org/鈴木> <http://purl.org/dc/elements/1.1/title> 
"\u9234\u6728\u592A\u90CE" .
 >>>>>>>>



Regards,     Martin.

Received on Thursday, 8 May 2003 15:28:25 UTC