- From: Adam Barth <ietf@adambarth.com>
- Date: Mon, 20 Jun 2011 01:03:30 -0700
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Chris Weber <chris@lookout.net>, Boris Zbarsky <bzbarsky@mit.edu>, public-iri@w3.org
On Mon, Jun 20, 2011 at 12:37 AM, Julian Reschke <julian.reschke@gmx.de> wrote: > On 2011-06-20 09:21, Adam Barth wrote: >> I wouldn't worry about file URLs for a while. They're vastly more >> complex than all the other kinds of URLs put together. If we could >> get interoperability for even just http URLs, I'd be happy. > > +1 > > So, what *is* the set of interop problems here? > > 1) Extracting them from a/@href and friends (whitespace treatment) > > 2) Handling invalid ASCII characters (SP, "\", "<", ">"...) > > 3) Handling non-ASCII characters in query component > > 4) Handling non-ASCII characters in authority components > > 5) Handling non-ASCII characters everywhere else > > Anything else? The page https://raw.github.com/abarth/url-spec/master/tests/gurl-results/by-browser.txt lists a bunch of inputs for which browsers provide different outputs. In the interest of simplifying the problem, I'd ignore the IP-address related issues for now as those are also somewhat complicated. >From an information-theoretic point of view, all the information in these tables needs to be included in the spec: http://code.google.com/p/google-url/source/browse/trunk/src/url_canon_etc.cc#84 http://code.google.com/p/google-url/source/browse/trunk/src/url_canon_host.cc#78 http://code.google.com/p/google-url/source/browse/trunk/src/url_canon_path.cc#77 http://code.google.com/p/google-url/source/browse/trunk/src/url_canon_internal.cc#133 Different browsers have slightly different tables because they've each tried to reverse engineer each other and didn't get it right because there isn't a decent spec. (Note: I don't care that particularly whether the tables in the spec match GURL. I just care that we end up with the same tables in every implementation.) Even just trivial things need to be cleaned up, like: http://ExAmple.CoM/ http://www.example.com/##asdf There's also some ugly stuff like (in JavaScript string-literal notation): http://www.example.com/?q=\ud800\ud800 getting transformed to http://www.example.com/?q=%26%2355296%3B%26%2355296%3B that needs to be explained. If we can get interop on all the http-scheme test cases in <http://trac.webkit.org/browser/trunk/LayoutTests/fast/url/>, I'd be very happy. Adam
Received on Monday, 20 June 2011 08:04:29 UTC