- From: Nathan <nathan@webr3.org>
- Date: Mon, 17 Jan 2011 17:09:55 +0000
- To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
- CC: Kingsley Idehen <kidehen@openlinksw.com>, public-lod@w3.org, Sandro Hawke <sandro@w3.org>
Better be a bit more specific.. in-line.. Nathan wrote: > Kingsley Idehen wrote: >> On 1/17/11 10:51 AM, Martin Hepp wrote: >>> Dear all: >>> >>> RFC 2616 [1, section 3.2.3] says that >>> >>> "When comparing two URIs to decide if they match or not, a client >>> SHOULD use a case-sensitive octet-by-octet comparison of the entire >>> URIs, with these exceptions: >>> >>> - A port that is empty or not given is equivalent to the default >>> port for that URI-reference; >>> - Comparisons of host names MUST be case-insensitive; >>> - Comparisons of scheme names MUST be case-insensitive; >>> - An empty abs_path is equivalent to an abs_path of "/". >>> >>> Characters other than those in the "reserved" and "unsafe" sets (see >>> RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding. >>> >>> For example, the following three URIs are equivalent: >>> >>> http://abc.com:80/~smith/home.html >>> http://ABC.com/%7Esmith/home.html >>> http://ABC.com:/%7esmith/home.html As per the percent encoding rules and the set of unreserved characters [1], percent encoded octets in certain ranges (see [1]) should not be created by URI producers, and when found in a URI should be decoded correctly, this includes %7E - also percent encoding is case insensitive so %7e and %7E are equivalent, thus you should not produce URIs like this, and when found you should fix the error, to produce: http://abc.com:80/~smith/home.html http://ABC.com/~smith/home.html http://ABC.com:/~smith/home.html The above URIs all use the generic syntax, so the generic component syntax equivalence rules always apply [2], so normalization after these rules would produce: http://abc.com:80/~smith/home.html http://abc.com/~smith/home.html http://abc.com:/~smith/home.html Then finally, scheme specific normalization rules can be applied which treat all the port values as being equivalent (for the purpose of naming and dereferencing, it's the specification for URIs with that scheme), which allows you to normalize to: http://abc.com/~smith/home.html http://abc.com/~smith/home.html http://abc.com/~smith/home.html [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1 [2] http://tools.ietf.org/html/rfc3986#section-2.3 [3] http://tools.ietf.org/html/rfc3986#section-6.2.3 Hope that helps refine my previous comments, >>> Does this also hold for identifying RDF resources >> >> Yes, where an RDF resource is a Data Container at an Address (URL). >> Thus, equivalent results for de-referencing a URL en route to >> accessing data. >> >> No, when "resource" also implies an Entity (Data Item or Data Object) >> that is assigned a Name via URI. > > Logically, yes on both counts, we should/could be normalizing these URIs > as we consume and publish using the syntax based normalization rules [1] > which apply to all URI/IRIs with the generic syntax (such as the > examples above) > > Any client consuming data, or server publishing data, can use the > normalization rules, so it stands to reason that it's pretty important > that we all do it to avoid false negatives. > > [1] http://tools.ietf.org/html/rfc3986#section-6.2.2 > > Best, > > Nathan >
Received on Monday, 17 January 2011 17:11:45 UTC