W3C home > Mailing lists > Public > public-lod@w3.org > January 2011

RE: URI Comparisons: RFC 2616 vs. RDF

From: Nuno Bettencourt <nuno.bett@gmail.com>
Date: Mon, 17 Jan 2011 17:27:51 -0000
To: <nathan@webr3.org>, "'Dave Reynolds'" <dave.e.reynolds@gmail.com>, "'Sandro Hawke'" <sandro@w3.org>
Cc: "'Martin Hepp'" <martin.hepp@ebusiness-unibw.org>, <public-lod@w3.org>
Message-ID: <007f01cbb66b$df731c10$9e595430$@gmail.com>
Hi, 

Even though I'll be deviating the point just a bit, since we're discussing URI comparison in terms of RDF, I would like to request some help.

I have a doubt about URLs when it comes to RDF URI comparison. Is there any RFC that establishes if 

http://abc.com:80/~smith/home.html
https://abc.com:80/~smith/home.html
or even
ftp://abc.com:80/~smith/home.html
 
should or not be considered the same resource?

Best regards,

Nuno Bettencourt

> -----Original Message-----
> From: public-lod-request@w3.org [mailto:public-lod-request@w3.org] On
> Behalf Of Nathan
> Sent: segunda-feira, 17 de Janeiro de 2011 16:53
> To: Dave Reynolds; Sandro Hawke
> Cc: Martin Hepp; public-lod@w3.org
> Subject: Re: URI Comparisons: RFC 2616 vs. RDF
> 
> Dave Reynolds wrote:
> > On Mon, 2011-01-17 at 16:51 +0100, Martin Hepp wrote:
> >> Dear all:
> >>
> >> RFC 2616 [1, section 3.2.3] says that
> >>
> >> "When comparing two URIs to decide if they match or not, a client
> >> SHOULD use a case-sensitive octet-by-octet comparison of the entire
> >>     URIs, with these exceptions:
> >>
> >>        - A port that is empty or not given is equivalent to the default
> >>          port for that URI-reference;
> >>        - Comparisons of host names MUST be case-insensitive;
> >>        - Comparisons of scheme names MUST be case-insensitive;
> >>        - An empty abs_path is equivalent to an abs_path of "/".
> >>
> >>     Characters other than those in the "reserved" and "unsafe" sets (see
> >>     RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
> >>
> >>     For example, the following three URIs are equivalent:
> >>
> >>        http://abc.com:80/~smith/home.html
> >>        http://ABC.com/%7Esmith/home.html
> >>        http://ABC.com:/%7esmith/home.html
> >> "
> >>
> >> Does this also hold for identifying RDF resources
> >>
> >> a) in theory and
> >
> > No. RDF Concepts defines equality of RDF URI References [1] as simply
> > character-by-character equality of the %-encoded UTF-8 Unicode strings.
> >
> > Note the final Note in that section:
> >
> > """
> > Note: Because of the risk of confusion between RDF URI references that
> > would be equivalent if derefenced, the use of %-escaped characters in
> > RDF URI references is strongly discouraged.
> > """
> >
> > which explicitly calls out the difference between URI equivalence
> > (dereference to the same resource) and RDF URI Reference equality.
> 
> I'd suggest that it's a little more complex than that, and that this may be an
> issue to clear up in the next RDF WG (it's on the charter I believe).
> 
> For example:
> 
>     When a URI uses components of the generic syntax, the component
>     syntax equivalence rules always apply; namely, that the scheme and
>     host are case-insensitive and therefore should be normalized to
>     lowercase.  For example, the URI <HTTP://www.EXAMPLE.com/> is
>     equivalent to <http://www.example.com/>.
> 
> - http://tools.ietf.org/html/rfc3986#section-6.2.2.1
> 
> However, that's only for URIs which use the generic syntax (which most URIs
> we ever touch do use).
> 
> It would be great if a normalized-IRI with specific normalization rules could be
> drafted up as part of the next WG charter - after all they are a pretty pivotal
> part of the sem web setup, and it would be relatively easy to clear up these
> issues.
> 
> Best,
> 
> Nathan
Received on Monday, 17 January 2011 17:35:57 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:31 UTC