W3C home > Mailing lists > Public > public-lod@w3.org > January 2011

Re: URI Comparisons: RFC 2616 vs. RDF

From: Nathan <nathan@webr3.org>
Date: Mon, 17 Jan 2011 16:52:32 +0000
Message-ID: <4D3473D0.30702@webr3.org>
To: Dave Reynolds <dave.e.reynolds@gmail.com>, Sandro Hawke <sandro@w3.org>
CC: Martin Hepp <martin.hepp@ebusiness-unibw.org>, public-lod@w3.org
Dave Reynolds wrote:
> On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote: 
>> Dear all:
>> RFC 2616 [1, section 3.2.3] says that
>> "When comparing two URIs to decide if they match or not, a client   
>> SHOULD use a case-sensitive octet-by-octet comparison of the entire
>>     URIs, with these exceptions:
>>        - A port that is empty or not given is equivalent to the default
>>          port for that URI-reference;
>>        - Comparisons of host names MUST be case-insensitive;
>>        - Comparisons of scheme names MUST be case-insensitive;
>>        - An empty abs_path is equivalent to an abs_path of "/".
>>     Characters other than those in the "reserved" and "unsafe" sets (see
>>     RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
>>     For example, the following three URIs are equivalent:
>>        http://abc.com:80/~smith/home.html
>>        http://ABC.com/%7Esmith/home.html
>>        http://ABC.com:/%7esmith/home.html
>> "
>> Does this also hold for identifying RDF resources
>> a) in theory and
> No. RDF Concepts defines equality of RDF URI References [1] as simply
> character-by-character equality of the %-encoded UTF-8 Unicode strings.
> Note the final Note in that section:
> """
> Note: Because of the risk of confusion between RDF URI references that
> would be equivalent if derefenced, the use of %-escaped characters in
> RDF URI references is strongly discouraged. 
> """
> which explicitly calls out the difference between URI equivalence
> (dereference to the same resource) and RDF URI Reference equality.

I'd suggest that it's a little more complex than that, and that this may 
be an issue to clear up in the next RDF WG (it's on the charter I believe).

For example:

    When a URI uses components of the generic syntax, the component
    syntax equivalence rules always apply; namely, that the scheme and
    host are case-insensitive and therefore should be normalized to
    lowercase.  For example, the URI <HTTP://www.EXAMPLE.com/> is
    equivalent to <http://www.example.com/>.

- http://tools.ietf.org/html/rfc3986#section-

However, that's only for URIs which use the generic syntax (which most 
URIs we ever touch do use).

It would be great if a normalized-IRI with specific normalization rules 
could be drafted up as part of the next WG charter - after all they are 
a pretty pivotal part of the sem web setup, and it would be relatively 
easy to clear up these issues.


Received on Monday, 17 January 2011 16:53:21 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:11 UTC