- From: Renaud Delbru <renaud.delbru@deri.org>
- Date: Mon, 17 Jan 2011 17:10:18 +0000
- To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
- CC: public-lod@w3.org, Kingsley Idehen <kidehen@openlinksw.com>, dave.e.reynolds@gmail.com
Hi, I am particularly interested about this issue, because I am currently struggling with such a problem within the Sindice project. Given also the answer of Dave, what would be the best practices within a (RDF) system to correctly handle URIs ? Should the system implements URI normalisation based on the RFC 2616 exceptions: - A port that is empty or not given is equivalent to the default port for that URI-reference; - Comparisons of host names MUST be case-insensitive; - Comparisons of scheme names MUST be case-insensitive; - An empty abs_path is equivalent to an abs_path of "/". and should take care of decoding all percent-encoded characters ? However, when dealing with percent-encoded character, some cases become tricky to handle. For example, some URIs [1] have a space encoded at the end of the string. By decoding it, certain systems/applications could automatically trim it. Also, some URIs [2] are 'recursively' encoded, and need multiple decoding pass before getting the right one. [1] http://geo.linkeddata.es/resource/Pozo/Moro%2C%20Pou%2047%20o%20del%20 [2] http://sioc-project.org/sioc/user/1%2523user Any opinions on how to correctly handle URis is welcome. It will be useful to have a document for "best practices" for correctly handling URIs in a RDF system. Best, -- Renaud Delbru On 17/01/11 15:51, Martin Hepp wrote: > Dear all: > > RFC 2616 [1, section 3.2.3] says that > > "When comparing two URIs to decide if they match or not, a client > SHOULD use a case-sensitive octet-by-octet comparison of the entire > URIs, with these exceptions: > > - A port that is empty or not given is equivalent to the default > port for that URI-reference; > - Comparisons of host names MUST be case-insensitive; > - Comparisons of scheme names MUST be case-insensitive; > - An empty abs_path is equivalent to an abs_path of "/". > > Characters other than those in the "reserved" and "unsafe" sets (see > RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding. > > For example, the following three URIs are equivalent: > > http://abc.com:80/~smith/home.html > http://ABC.com/%7Esmith/home.html > http://ABC.com:/%7esmith/home.html > " > > Does this also hold for identifying RDF resources > > a) in theory and > b) in practice (e.g. in popular triplestores)? > > I did not test it yet, but I assume that not all implementations would > treat > > http://purl.org/NET/c4dm/event.owl#Event > HTTP://purl.org/NET/c4dm/event.owl#Event > http://PURL.org/NET/c4dm/event.owl#Event > http://purl.org:80/NET/c4dm/event.owl#Event > > as the same class. > > Any facts or opinions? > > Best > > Martin > > > [1] http://www.ietf.org/rfc/rfc2616.txt > > -------------------------------------------------------- > martin hepp > e-business & web science research group > universitaet der bundeswehr muenchen > > e-mail: hepp@ebusiness-unibw.org > phone: +49-(0)89-6004-4217 > fax: +49-(0)89-6004-4620 > www: http://www.unibw.de/ebusiness/ (group) > http://www.heppnetz.de/ (personal) > skype: mfhepp > twitter: mfhepp > >
Received on Monday, 17 January 2011 17:10:52 UTC