When URI-Refs attack...

I just noticed something about URI-Refs, and I thought I'd share it
here, because it seems to affect a few RDF/XML parsers and the wording
of a few specs.  It is an edge-case, but it just bit me, so...

RFC3986 describes how to convert a URI-Ref to a URI by resolving it
against a base-URI.  This algorithm always performs the
"remove_dot_segments" step on the relative ref, even if the relative
ref has a scheme, or an absolute path.

This means that RDF such as:

  <rdf:Description rdf:about="http://example.com/whatever/../test">
    <rdf:value>val</rdf:value>
  </rdf:Description>

should parse to:

  <http://example.com/test>    rdf:value    "val"    .

if using the RFC 3986 rules.  I think.

...but, RDF/XML is based on RFC 2396 and so is xml:base, where the
algorithm doesn't seem to apply remove_dot_segments to absolute
paths.  Also xml:base talks about resolving "relative references", and
it isn't clear whether the word "relative" is being descriptive, or
defining a subset of reference.

Interestingly this means that a new-style URI-Ref can't refer to a URI
that contains dot-segments.  URIs and absolute URI-Refs look alike,
but are different data-types with different semantics - whether
remove_dot_segments needs to be applied, and it is the spec of where
you put it that determines what to do with dot-segments, rather than
what the construct looks like.

It seems that some RDF parsers convert process dot-segments, and some
don't.


Anyway, this actually has some practical significance, because in
order to handle xml:base in Atom/RDF for GRDDL, I need to be able to
resolve URI-Refs against each other, and as XSLT 1.0 has painful
string handling, my current algorithm outputs possibly absolute
URI-Refs, which possibly contain dot-segments, expecting them to be
resolved, as in RFC 3986.  It is looking like this probably isn't
going to work well, and I'm going to have to implement
remove_dot_segments in the XSLT...

-- 
Dave

Received on Monday, 21 January 2008 19:38:52 UTC