- From: Mike Brown <mike@skew.org>
- Date: Fri, 27 Jul 2007 16:33:29 -0600 (MDT)
- To: Sebastian Pipping <webmaster@hartwork.org>
- CC: uri@w3.org
Sebastian Pipping wrote: > == Reference creation == > For the inverse process of reference resolution, > e.g. making > > abc > > out of > > URI: http://example.com/some/abc > BASE: http://example.com/some/ > > is there any algorithm description or pseudo-code > available? I didn't find any in RFC 3986. For an idea of how it can be done in Python, see the Relativize function in http://cvs.4suite.org/viewcvs/4Suite/Ft/Lib/Uri.py?view=markup Test cases are in http://cvs.4suite.org/viewcvs/4Suite/test/Lib/test_uri.py?view=markup. Scroll down to where it says "Test cases for Relativize()". The first 2 values in each tuple are the first 2 arguments to Relativize(), and the last 2 values are the expected results when the 3rd argument is False or True, respectively. I'd be interested to know if anyone else has implemented this kind of thing. Perhaps we could keep track of them at http://esw.w3.org/topic/UriTesting. > I plan to implement this and want to be sure > to do it right. Is it just (1) normalize both URIs > and (2) cut off common prefix? Pretty much, but you need to think in terms of path segments, not just the whole URI string; otherwise you'll be tripped up by query and fragment components. Also consider this: To get from /a to /b/c/d, you only need b/c/d, which fits your 'common prefix' rule above. But to get from /b/c/d to /a, you need "/a", which doesn't fit the rule. The way we did it, we give up if the scheme or authority components differ, which may not be what you want (we want to produce a path component only). We also give up if either the target URI or base URI has no path component, or if one path is relative and the other is absolute. We also do some special-casing to make sure the algorithm isn't tripped up by empty path segments. Due credit: This functionality was added to 4Suite by John L. Clark. It's one of the few parts of our URI library that wasn't written by me or Uche Ogbuji. If you base your code on it, just mention in comments that it's based on code from 4Suite XML 1.0. If you have ideas on how to improve it, post to the mailing list at 4suite-dev@lists.fourthought.com. > == Error in RFC 3986 section 6.2.3? === > Section 6.2.3. Scheme-Based Normalization reads > > "(e.g., "mailto:Joe@Example.COM" is equivalent to > "mailto:Joe@example.com", even though the generic > syntax considers the path component to be case- > sensitive)" . > > Isn't ".COM" case-insensitive since it is part of the > host not path? No, the entire email address is the 'path' component of the URI. The 'authority' component (of which 'host' is a subcomponent) does not exist in mailto URIs. Mike
Received on Friday, 27 July 2007 22:33:53 UTC