- From: Renaud Delbru <renaud.delbru@deri.org>
- Date: Mon, 17 Jan 2011 17:10:18 +0000
- To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
- CC: public-lod@w3.org, Kingsley Idehen <kidehen@openlinksw.com>, dave.e.reynolds@gmail.com
Hi,
I am particularly interested about this issue, because I am currently
struggling with such a problem within the Sindice project.
Given also the answer of Dave, what would be the best practices within a
(RDF) system to correctly handle URIs ?
Should the system implements URI normalisation based on the RFC 2616
exceptions:
- A port that is empty or not given is equivalent to the default
port for that URI-reference;
- Comparisons of host names MUST be case-insensitive;
- Comparisons of scheme names MUST be case-insensitive;
- An empty abs_path is equivalent to an abs_path of "/".
and should take care of decoding all percent-encoded characters ?
However, when dealing with percent-encoded character, some cases become
tricky to handle. For example, some URIs [1] have a space encoded at the
end of the string. By decoding it, certain systems/applications could
automatically trim it. Also, some URIs [2] are 'recursively' encoded,
and need multiple decoding pass before getting the right one.
[1] http://geo.linkeddata.es/resource/Pozo/Moro%2C%20Pou%2047%20o%20del%20
[2] http://sioc-project.org/sioc/user/1%2523user
Any opinions on how to correctly handle URis is welcome. It will be
useful to have a document for "best practices" for correctly handling
URIs in a RDF system.
Best,
--
Renaud Delbru
On 17/01/11 15:51, Martin Hepp wrote:
> Dear all:
>
> RFC 2616 [1, section 3.2.3] says that
>
> "When comparing two URIs to decide if they match or not, a client
> SHOULD use a case-sensitive octet-by-octet comparison of the entire
> URIs, with these exceptions:
>
> - A port that is empty or not given is equivalent to the default
> port for that URI-reference;
> - Comparisons of host names MUST be case-insensitive;
> - Comparisons of scheme names MUST be case-insensitive;
> - An empty abs_path is equivalent to an abs_path of "/".
>
> Characters other than those in the "reserved" and "unsafe" sets (see
> RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
>
> For example, the following three URIs are equivalent:
>
> http://abc.com:80/~smith/home.html
> http://ABC.com/%7Esmith/home.html
> http://ABC.com:/%7esmith/home.html
> "
>
> Does this also hold for identifying RDF resources
>
> a) in theory and
> b) in practice (e.g. in popular triplestores)?
>
> I did not test it yet, but I assume that not all implementations would
> treat
>
> http://purl.org/NET/c4dm/event.owl#Event
> HTTP://purl.org/NET/c4dm/event.owl#Event
> http://PURL.org/NET/c4dm/event.owl#Event
> http://purl.org:80/NET/c4dm/event.owl#Event
>
> as the same class.
>
> Any facts or opinions?
>
> Best
>
> Martin
>
>
> [1] http://www.ietf.org/rfc/rfc2616.txt
>
> --------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
>
> e-mail: hepp@ebusiness-unibw.org
> phone: +49-(0)89-6004-4217
> fax: +49-(0)89-6004-4620
> www: http://www.unibw.de/ebusiness/ (group)
> http://www.heppnetz.de/ (personal)
> skype: mfhepp
> twitter: mfhepp
>
>
Received on Monday, 17 January 2011 17:10:52 UTC