W3C home > Mailing lists > Public > uri@w3.org > July 2007

Re: URI Reference creation

From: Mike Brown <mike@skew.org>
Date: Fri, 27 Jul 2007 16:33:29 -0600 (MDT)
Message-Id: <200707272233.l6RMXUNA041019@chilled.skew.org>
To: Sebastian Pipping <webmaster@hartwork.org>
CC: uri@w3.org

Sebastian Pipping wrote:
> == Reference creation ==
> For the inverse process of reference resolution,
> e.g. making
> 
>    abc
> 
> out of
> 
>    URI:  http://example.com/some/abc
>    BASE: http://example.com/some/
> 
> is there any algorithm description or pseudo-code
> available? I didn't find any in RFC 3986.

For an idea of how it can be done in Python, see the Relativize function in 
http://cvs.4suite.org/viewcvs/4Suite/Ft/Lib/Uri.py?view=markup

Test cases are in 
http://cvs.4suite.org/viewcvs/4Suite/test/Lib/test_uri.py?view=markup. Scroll 
down to where it says "Test cases for Relativize()". The first 2 values in 
each tuple are the first 2 arguments to Relativize(), and the last 2 values 
are the expected results when the 3rd argument is False or True, respectively.

I'd be interested to know if anyone else has implemented this kind of thing.
Perhaps we could keep track of them at http://esw.w3.org/topic/UriTesting.

> I plan to implement this and want to be sure
> to do it right. Is it just (1) normalize both URIs
> and (2) cut off common prefix?

Pretty much, but you need to think in terms of path segments, not just the 
whole URI string; otherwise you'll be tripped up by query and fragment 
components.

Also consider this:

To get from /a to /b/c/d, you only need b/c/d, which fits your 'common prefix' 
rule above. But to get from /b/c/d to /a, you need "/a", which doesn't fit the 
rule.


The way we did it, we give up if the scheme or authority components differ, 
which may not be what you want (we want to produce a path component only). We 
also give up if either the target URI or base URI has no path component, or if 
one path is relative and the other is absolute.

We also do some special-casing to make sure the algorithm isn't tripped up by
empty path segments.


Due credit: This functionality was added to 4Suite by John L. Clark. It's one 
of the few parts of our URI library that wasn't written by me or Uche Ogbuji.
If you base your code on it, just mention in comments that it's based on code
from 4Suite XML 1.0.

If you have ideas on how to improve it, post to the mailing list at 
4suite-dev@lists.fourthought.com.

> == Error in RFC 3986 section 6.2.3? ===
> Section 6.2.3. Scheme-Based Normalization reads
> 
>   "(e.g., "mailto:Joe@Example.COM" is equivalent to
>    "mailto:Joe@example.com", even though the generic
>    syntax considers the path component to be case-
>    sensitive)" .
> 
> Isn't ".COM" case-insensitive since it is part of the
> host not path?

No, the entire email address is the 'path' component of the URI. The 
'authority' component (of which 'host' is a subcomponent) does not exist in 
mailto URIs.

Mike
Received on Friday, 27 July 2007 22:33:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:37 GMT