RE: xsd:anyURI, rdf URIs, information resources

This message merely traces the reasoning path through the RDF specs to understand
what parts of the specs compel <http://neurocommons.org/page/Main%5FPage> to be viewed as a different URI reference from <http://neurocommons.org/page/Main_Page>.  Since I went to the trouble of tracing this, I figured I might as well document it.  You can safely ignore it if you aren't interested in the analysis.  The analysis does confirm that they are viewed as different URI references in RDF, BTW.

The RDF/XML spec defines an RDF/XML Document as:
http://www.w3.org/TR/rdf-syntax-grammar/#dfn-rdf-xml-document
[[
An RDF/XML Document(link) is an RDF Document written in the recommended XML transfer syntax for RDF as defined in this document.
]]

and defines an RDF Document as:
http://www.w3.org/TR/rdf-syntax-grammar/#dfn-rdf-document
[[
An RDF Document(link) is a serialization of an RDF Graph  into a concrete syntax.
]]

And the term "RDF Graph" is a link to this definition:
http://www.w3.org/TR/rdf-concepts/#dfn-rdf-graph
[[
An RDF graph is a set of RDF triples.
]]

And the term "RDF Triples" is defined:
http://www.w3.org/TR/rdf-concepts/#dfn-rdf-triple
[[
An RDF triple(link) contains three components:

    * the subject(link), which is an RDF URI reference or a blank node
    * the predicate(link), which is an RDF URI reference
    * the object(link), which is an RDF URI reference, a literal or a blank node
]]

And the term "RDF URI reference" is defined (in part) as:
http://www.w3.org/TR/rdf-concepts/#dfn-URI-reference
[[
A URI reference within an RDF graph (an RDF URI reference) is a Unicode string [UNICODE] that:

    * does not contain any control characters ( #x00 - #x1F, #x7F-#x9F)
    * and would produce a valid URI character sequence (per RFC2396 [URI], sections 2.1) representing an absolute URI with optional fragment identifier when subjected to the encoding described below.

The encoding consists of:

   1. encoding the Unicode string as UTF-8 [RFC-2279], giving a sequence of octet values.
   2. %-escaping octets that do not correspond to permitted US-ASCII characters.

The disallowed octets that must be %-escaped include all those that do not correspond to US-ASCII characters, and the excluded characters listed in Section 2.4 of [URI], except for the number sign (#), percent sign (%), and the square bracket characters re-allowed in [RFC-2732].
]]

So if I'm understanding the above rule correctly and we consider whether the Unicode string "http://neurocommons.org/page/Main%5FPage" conforms to this definition of an RDF URI reference, we observe:
 - it does not contain any control characters;
 - in contains only US-ASCII characters;
 - it does not contain any of { the excluded characters listed Section 2.4 of RFC2396 except for percent sign };
 - therefore none of the corresponding octets must be further %-encoded; and
 - therefore, it would produce a valid URI character sequence per RFC2396 when subjected to the encoding described above.

Therefore, the Unicode string "http://neurocommons.org/page/Main%5FPage" *does* conform to the definition of an RDF URI reference.  Similarly, the Unicode string "http://neurocommons.org/page/Main_Page" would conform, and per
http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref
[[
Two RDF URI references are equal if and only if they compare as equal, character by character, as Unicode strings.
]]
they are different RDF URI references.

I notice that the RDF Concepts document section 6.4 also warns about this issue:
http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref
[[
Note: Because of the risk of confusion between RDF URI references that would be equivalent if derefenced, the use of %-escaped characters in RDF URI references is strongly discouraged. See also the URI equivalence issue of the Technical Architecture Group [TAG].
]]
and the TAG's URI equivalence issue is here:
http://www.w3.org/2001/tag/issues.html#URIEquivalence-15



David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Statements made herein represent the views of the author and do not necessarily represent the official views of HP unless explicitly so stated.

Received on Thursday, 3 July 2008 19:43:15 UTC