[ALL] xml:base and URIs with hashes

Just putting this on the record ...

I've just discovered a technical detail wrt xml:base attribute in RDF/XML docs - if you put a hash at the end of the base URI it is ignored when constructing relative URIs.

I.e. the following RDF/XML:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF 
  xml:base="http://example.com/foo#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
  
  <rdf:Description rdf:about="">
    <rdfs:label>this</rdfs:label>
  </rdf:Description>
 
</rdf:RDF>

... gives the following triple:

<http://example.com/foo> rdfs:label "this".

It doesn't matter if you put more than one hash at the end of the base URI, they are all ignored. I.e. the following RDF/XML:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF 
  xml:base="http://example.com/foo####"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
  
  <rdf:Description rdf:about="">
    <rdfs:label>this</rdfs:label>
  </rdf:Description>
 
</rdf:RDF>
	
... gives the same triple as above:

<http://example.com/foo> rdfs:label "this".

Note however that the absolute URI <http://example.com/spong> is treated by RDF parsers as different from the absolute URI <http://example.com/spong#> (which spec can verify that this?).  I.e. the following RDF/XML:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

  <rdf:Description rdf:about="http://example.com/spong#">
    <rdfs:label>spong#</rdfs:label>
  </rdf:Description>
    
  <rdf:Description rdf:about="http://example.com/spong">
    <rdfs:label>spong</rdfs:label>
  </rdf:Description>

</rdf:RDF>

... gives the following triples:

<http://example.com/spong#> rdfs:label "spong#".
<http://example.com/spong> rdfs:label "spong".

Interestingly, the URI <http://example.com/spong##> raises an error when using the W3C RDF validation service (running ARP) - 

Error: {W107} Bad URI <http://elsewhere.org/spong##>: Fragment contains invalid character:#

However, Sesame 1.2.3 doesn't raise an error, and creates a new URI resource in the graph.

I remember vaguely Jeremy Carroll saying that hashes are actually allowed in fragment ids ... is this right or wrong?

Finally, this has implications for RDFS/OWL vocabularies/ontologies that use a hash namespace, because it means we have to be careful not to confuse the 'vocabulary URI' (a.k.a. the 'ontology URI', i.e. the URI that identifies the vocabulary/ontology) and the 'namespace URI' (i.e. the actual URI you append the local name of each term to). E.g. 

http://example.com/vocab - vocabulary URI.
http://example.com/vocab# - namespace URI. 

For RDFS/OWL vocabularies/ontologies that use a slash namespace both the vocabulary URI and the namespace URI are the same, e.g.

http://example.com/anothervocab/ - vocabulary URI.
http://example.com/anothervocab/ - namespace URI.

Cheers,

Al.

---
Alistair Miles
Research Associate
CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom
Email:        a.j.miles@rl.ac.uk
Tel: +44 (0)1235 445440

Received on Friday, 20 January 2006 18:57:57 UTC