W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > June 2001

Re: #rdfms-difference-between-ID-and-about (non-ASCII characters in IDs)

From: Dan Connolly <connolly@w3.org>
Date: Thu, 14 Jun 2001 17:46:49 -0500
Message-ID: <3B293ED9.2F309122@w3.org>
To: guha@alpiri.com, Brian McBride <bwm@hplb.hpl.hp.com>, Sergey Melnik <melnik@db.stanford.edu>, rdf core <w3c-rdfcore-wg@w3.org>
Dan Connolly wrote:
> 
> "R.V.Guha" wrote:
> >
> > Sorry for being slow, but on reconstruction, how does
> > one disambiguate between
> >
> > <rdf:Description rdf:ID="#foo"/>
> >
> > and
> >
> > <rdf:Description rdf:about="##foo"/>
> 
> One doesn't; #foo isn't an XML ID (they start
> with letter), nor is ##foo a URI reference (no URI
> reference has two #'s in it).

Now that I think about it... that reminds me of a
nasty hairball that I thought we could avoid for a while;
but it comes up if we're getting serious about
saying that rdf:ID="foo" is the same as rdf:about="#foo":
XML IDs can use non-ASCII characters, but URI references
cannot.

I put at test case at
http://www.w3.org/2000/10/rdf-tests/rdfms-difference-between-ID-and-about/nonASCIIid.rdf
http://www.w3.org/2000/10/rdf-tests/rdfms-difference-between-ID-and-about/nonASCIIid.nt

Here's a copy:

==============
<!--  non-ascii characters in IDs...
      cf http://www.w3.org/TR/charmod/#sec-URIs
         http://www.w3.org/TR/2001/WD-charmod-20010126/#sec-URIs
 -->
<rdf:RDF
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        >
    <rdf:Description rdf:ID="D&#xFC;rst">
      <rdf:value>abc</rdf:value>
    </rdf:Description>
    <rdf:Description rdf:about="#D%C3%BCrst">
      <rdf:value>abc</rdf:value>
    </rdf:Description>
</rdf:RDF>
==============

cwm gets it wrong, but the right answer, I suggest, has just one triple:

=========
<http://www.w3.org/2000/10/rdf-tests/rdfms-difference-between-ID-and-about/nonASCIIid.rdf#D%C3%BCrst>    
<http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "abc" .
=========

This follows, indirectly, from:

[[[
Note: Although non-ASCII characters in URIs are not allowed by [URI],
[XML]
     specifies a convention to avoid unnecessary incompatibilities in
extended
     URI syntax. Implementors of RDF are encouraged to avoid further
     incompatibility and use the XML convention for system identifiers.
Namely,
     that a non-ASCII character in a URI be represented in UTF-8 as one
or more
     bytes, and then these bytes be escaped with the URI escaping
mechanism
     (i.e., by converting each byte to %HH, where HH is the hexadecimal
notation
     of the byte value). 
]]]

--        Resource Description Framework (RDF) Model and Syntax
Specification
http://www.w3.org/TR/REC-rdf-syntax/
Wed, 24 Feb 1999 14:45:07 GMT

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Thursday, 14 June 2001 18:47:56 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:37:08 EDT