Re: URI Comparisons: RFC 2616 vs. RDF from Nathan on 2011-01-17 (public-lod@w3.org from January 2011)

From: Nathan <nathan@webr3.org>
Date: Mon, 17 Jan 2011 17:09:55 +0000
To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
CC: Kingsley Idehen <kidehen@openlinksw.com>, public-lod@w3.org, Sandro Hawke <sandro@w3.org>
Message-ID: <4D3477E3.9040801@webr3.org>

Better be a bit more specific.. in-line..

Nathan wrote:
> Kingsley Idehen wrote:
>> On 1/17/11 10:51 AM, Martin Hepp wrote:
>>> Dear all:
>>>
>>> RFC 2616 [1, section 3.2.3] says that
>>>
>>> "When comparing two URIs to decide if they match or not, a client  
>>> SHOULD use a case-sensitive octet-by-octet comparison of the entire
>>>    URIs, with these exceptions:
>>>
>>>       - A port that is empty or not given is equivalent to the default
>>>         port for that URI-reference;
>>>       - Comparisons of host names MUST be case-insensitive;
>>>       - Comparisons of scheme names MUST be case-insensitive;
>>>       - An empty abs_path is equivalent to an abs_path of "/".
>>>
>>>    Characters other than those in the "reserved" and "unsafe" sets (see
>>>    RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
>>>
>>>    For example, the following three URIs are equivalent:
>>>
>>>       http://abc.com:80/~smith/home.html
>>>       http://ABC.com/%7Esmith/home.html
>>>       http://ABC.com:/%7esmith/home.html

As per the percent encoding rules and the set of unreserved characters 
[1], percent encoded octets in certain ranges (see [1]) should not be 
created by URI producers, and when found in a URI should be decoded 
correctly, this includes %7E - also percent encoding is case insensitive 
so %7e and %7E are equivalent, thus you should not produce URIs like 
this, and when found you should fix the error, to produce:

    http://abc.com:80/~smith/home.html
    http://ABC.com/~smith/home.html
    http://ABC.com:/~smith/home.html

The above URIs all use the generic syntax, so the generic component 
syntax equivalence rules always apply [2], so normalization after these 
rules would produce:

    http://abc.com:80/~smith/home.html
    http://abc.com/~smith/home.html
    http://abc.com:/~smith/home.html

Then finally, scheme specific normalization rules can be applied which 
treat all the port values as being equivalent (for the purpose of naming 
and dereferencing, it's the specification for URIs with that scheme), 
which allows you to normalize to:

    http://abc.com/~smith/home.html
    http://abc.com/~smith/home.html
    http://abc.com/~smith/home.html

[1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
[2] http://tools.ietf.org/html/rfc3986#section-2.3
[3] http://tools.ietf.org/html/rfc3986#section-6.2.3

Hope that helps refine my previous comments,

>>> Does this also hold for identifying RDF resources
>>
>> Yes, where an RDF resource is a Data Container at an Address (URL). 
>> Thus, equivalent results for de-referencing a URL en route to 
>> accessing data.
>>
>> No, when "resource" also implies an Entity (Data Item or Data Object) 
>> that is assigned a Name via URI.
> 
> Logically, yes on both counts, we should/could be normalizing these URIs 
> as we consume and publish using the syntax based normalization rules [1] 
> which apply to all URI/IRIs with the generic syntax (such as the 
> examples above)
> 
> Any client consuming data, or server publishing data, can use the 
> normalization rules, so it stands to reason that it's pretty important 
> that we all do it to avoid false negatives.
> 
> [1] http://tools.ietf.org/html/rfc3986#section-6.2.2
> 
> Best,
> 
> Nathan
>

Received on Monday, 17 January 2011 17:11:45 UTC