- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Tue, 13 May 2003 10:23:23 +0100
- To: Martin Duerst <duerst@w3.org>
- Cc: www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, emmanuel@w3.org, www-rdf-validator@w3.org
On Thu, 08 May 2003 15:28:15 -0400 Martin Duerst <duerst@w3.org> wrote: > > Dear RDF specialists, > > [This is currently a personal comment. I'll ask the I18N WG to > look at it on their teleconf next week.] > > Emmanuel and me just discovered a problem in the RDF spec, > in the definition of N-triples at > http://www.w3.org/TR/rdf-testcases/#sec-uri-encoding The 23 January 2003 last call WD, http://www.w3.org/TR/2003/WD-rdf-testcases-20030123/#sec-uri-encoding > > This says: > > >>>> > Disallowed characters are represented in UTF-8 and then encoded using the > %HH format, where HH is the byte value expressed using hexadecimal notation. > > Characters above the US-ASCII range are made available by the \u or \U > escapes as described in section Strings for ranges [#x80-#xFFFF] and > [#x10000-#x10FFFF] respectively. > >>>> > > So if I have <http://example.org/ÎëÌÚ>, ... an IRI > ... what's the correct representation > of this in N-triples? Is it <http://example.org/\u9234\u6728> ? Or is > it <http://example.org/%E9%88%B4%E6%9C%A8> ? The spec currently seems > to allow both, but this clearly is going against the purpose of N-triples > for testing purposes. RDF uses it's own definition of RDF URI References, which should have been linked from here rather than to the IRI definition in the ongoing Charmod draft work. It might be easier to say less so that whatever characters your RDF URI reference contains, this document just tells you how to encode it. I think that removing the first paragraph in the quote above should be sufficient, along with adding a reference to the RDF Concepts WD definitions. Would that help? > In line with the IRI spec, which says that conversion to URIs should > be done as late as possible, I strongly suggest to only use the > http://example.org/\u9234\u6728 form. This should also allow to > streamline the description of escaping, which should become the > same for Strings and for URIs. There should also be a statement > saying that the escaping is needed for N-triples, but not for N3, ... I won't be discussing N3 in this document, it doesn't define that changing research language. In this regard, for example, I recall that N3 changed from ASCII to UTF-8 since N-Triples was designed. > ...and there should be some I18N component in the example to show > the points. ... This can be fixed by adding to the test file http://www.w3.org/2000/10/rdf-tests/rdfcore/ntriples/test.nt some encoded RDF URI examples such as <http://example.org/\u9234\u6728> I will do this if that will address this sufficiently. > ... Using the \u form is also robust to potential > changes in the IRI spec. Some care is needed about characters in > the ASCII range that are not allowed in URIs. I really don't want to give the detail of URIs or IRIs - people can look up those specs if they want to know that, it shouldn't be duplicated here. > The RDF validator currently does a third thing, namely it does > not use any escaping at all: > > >>>>>>>> > The original RDF/XML document > > 1: <?xml version="1.0"?> > 2: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > 3: xmlns:dc="http://purl.org/dc/elements/1.1/"> > 4: <rdf:Description rdf:about="http://www.w3.org/ÎëÌÚ"> > 5: <dc:title>ÎëÌÚÂÀϺ</dc:title> > 6: </rdf:Description> > 7: </rdf:RDF> > 8: > > Triples of the Data Model in N-Triples Format (Sub, Pred, Obj) > > <http://www.w3.org/ÎëÌÚ> <http://purl.org/dc/elements/1.1/title> > "\u9234\u6728\u592A\u90CE" . > >>>>>>>> That's illegal N-Triples (7 bit US-ASCII); no character above 126 is allowed. Which reminds me, I made one for N-Triples: Redland N-Triples Validator http://www.redland.opensource.ac.uk/ntriples/ Dave
Received on Tuesday, 13 May 2003 05:24:38 UTC