Re: URI Comparisons: RFC 2616 vs. RDF from Dave Reynolds on 2011-01-17 (public-lod@w3.org from January 2011)

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Mon, 17 Jan 2011 18:07:56 +0000
To: nathan@webr3.org
Cc: public-lod@w3.org
Message-ID: <1295287676.2974.182.camel@dave-desktop>
On Mon, 2011-01-17 at 16:52 +0000, Nathan wrote: 
> Dave Reynolds wrote:
> > On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote: 
> >> Dear all:
> >>
> >> RFC 2616 [1, section 3.2.3] says that
> >>
> >> "When comparing two URIs to decide if they match or not, a client   
> >> SHOULD use a case-sensitive octet-by-octet comparison of the entire
> >>     URIs, with these exceptions:
> >>
> >>        - A port that is empty or not given is equivalent to the default
> >>          port for that URI-reference;
> >>        - Comparisons of host names MUST be case-insensitive;
> >>        - Comparisons of scheme names MUST be case-insensitive;
> >>        - An empty abs_path is equivalent to an abs_path of "/".
> >>
> >>     Characters other than those in the "reserved" and "unsafe" sets (see
> >>     RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
> >>
> >>     For example, the following three URIs are equivalent:
> >>
> >>        http://abc.com:80/~smith/home.html
> >>        http://ABC.com/%7Esmith/home.html
> >>        http://ABC.com:/%7esmith/home.html
> >> "
> >>
> >> Does this also hold for identifying RDF resources
> >>
> >> a) in theory and
> > 
> > No. RDF Concepts defines equality of RDF URI References [1] as simply
> > character-by-character equality of the %-encoded UTF-8 Unicode strings.
> > 
> > Note the final Note in that section:
> > 
> > """
> > Note: Because of the risk of confusion between RDF URI references that
> > would be equivalent if derefenced, the use of %-escaped characters in
> > RDF URI references is strongly discouraged. 
> > """
> > 
> > which explicitly calls out the difference between URI equivalence
> > (dereference to the same resource) and RDF URI Reference equality.
> 
> I'd suggest that it's a little more complex than that, and that this may 
> be an issue to clear up in the next RDF WG (it's on the charter I believe).

I beg to differ.

The charter does state: 

"Clarify the usage of IRI references for RDF resources, e.g., per SPARQL
Query §1.2.4."

However, I was under the impression that was simply removing the small
difference between "RDF URI References" and the IRI spec (that they had
anticipated). Specifically I thought the only substantive issue there
was the treatment of space and many RDF processors already take the
conservation position on that anyway.

Replacing encoded string equality by deference-equivalence would be a
pretty big change to RDF and I hadn't realized that was being
considered.

Could one of the nominated chairs or a W3C rep clarify this?

> For example:
> 
>     When a URI uses components of the generic syntax, the component
>     syntax equivalence rules always apply; namely, that the scheme and
>     host are case-insensitive and therefore should be normalized to
>     lowercase.  For example, the URI <HTTP://www.EXAMPLE.com/> is
>     equivalent to <http://www.example.com/>.
> 
> - http://tools.ietf.org/html/rfc3986#section-6.2.2.1

Sure but the later RDF-related specs such as GRDDL and RIF clarify the
application of that in RDF. For example in RIF [1] we said:

"Neither Syntax-Based Normalization nor Scheme-Based Normalization
(described in Sections 6.2.2 and 6.2.3 of RFC-3986) are performed."

A form of words that, I think, we lifted verbatim from GRDDL which in
turn had chosen them to clarify how the original RDF URI References spec
should be interpreted in the light of the updated URI/IRI RFCs.

Changing RDF to require syntax or scheme based normalization would
require changing at least RIF and GRDDL as well. If that was really on
the cards I would have expected it to have been more broadly publicized.

Dave

[1] http://www.w3.org/TR/2010/PR-rif-dtb-20100511/#Relative_IRIs
Received on Monday, 17 January 2011 18:08:43 UTC