- From: Nathan <nathan@webr3.org>
- Date: Mon, 17 Jan 2011 17:09:55 +0000
- To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
- CC: Kingsley Idehen <kidehen@openlinksw.com>, public-lod@w3.org, Sandro Hawke <sandro@w3.org>
Better be a bit more specific.. in-line..
Nathan wrote:
> Kingsley Idehen wrote:
>> On 1/17/11 10:51 AM, Martin Hepp wrote:
>>> Dear all:
>>>
>>> RFC 2616 [1, section 3.2.3] says that
>>>
>>> "When comparing two URIs to decide if they match or not, a client
>>> SHOULD use a case-sensitive octet-by-octet comparison of the entire
>>> URIs, with these exceptions:
>>>
>>> - A port that is empty or not given is equivalent to the default
>>> port for that URI-reference;
>>> - Comparisons of host names MUST be case-insensitive;
>>> - Comparisons of scheme names MUST be case-insensitive;
>>> - An empty abs_path is equivalent to an abs_path of "/".
>>>
>>> Characters other than those in the "reserved" and "unsafe" sets (see
>>> RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
>>>
>>> For example, the following three URIs are equivalent:
>>>
>>> http://abc.com:80/~smith/home.html
>>> http://ABC.com/%7Esmith/home.html
>>> http://ABC.com:/%7esmith/home.html
As per the percent encoding rules and the set of unreserved characters
[1], percent encoded octets in certain ranges (see [1]) should not be
created by URI producers, and when found in a URI should be decoded
correctly, this includes %7E - also percent encoding is case insensitive
so %7e and %7E are equivalent, thus you should not produce URIs like
this, and when found you should fix the error, to produce:
http://abc.com:80/~smith/home.html
http://ABC.com/~smith/home.html
http://ABC.com:/~smith/home.html
The above URIs all use the generic syntax, so the generic component
syntax equivalence rules always apply [2], so normalization after these
rules would produce:
http://abc.com:80/~smith/home.html
http://abc.com/~smith/home.html
http://abc.com:/~smith/home.html
Then finally, scheme specific normalization rules can be applied which
treat all the port values as being equivalent (for the purpose of naming
and dereferencing, it's the specification for URIs with that scheme),
which allows you to normalize to:
http://abc.com/~smith/home.html
http://abc.com/~smith/home.html
http://abc.com/~smith/home.html
[1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
[2] http://tools.ietf.org/html/rfc3986#section-2.3
[3] http://tools.ietf.org/html/rfc3986#section-6.2.3
Hope that helps refine my previous comments,
>>> Does this also hold for identifying RDF resources
>>
>> Yes, where an RDF resource is a Data Container at an Address (URL).
>> Thus, equivalent results for de-referencing a URL en route to
>> accessing data.
>>
>> No, when "resource" also implies an Entity (Data Item or Data Object)
>> that is assigned a Name via URI.
>
> Logically, yes on both counts, we should/could be normalizing these URIs
> as we consume and publish using the syntax based normalization rules [1]
> which apply to all URI/IRIs with the generic syntax (such as the
> examples above)
>
> Any client consuming data, or server publishing data, can use the
> normalization rules, so it stands to reason that it's pretty important
> that we all do it to avoid false negatives.
>
> [1] http://tools.ietf.org/html/rfc3986#section-6.2.2
>
> Best,
>
> Nathan
>
Received on Monday, 17 January 2011 17:11:45 UTC