- From: Erik van der Poel <erikv@google.com>
- Date: Fri, 9 Apr 2010 09:23:23 -0700
- To: Ian Hickson <ian@hixie.ch>
- Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, Ted Hardie <ted.ietf@gmail.com>, Maciej Stachowiak <mjs@apple.com>, Larry Masinter <LMM@acm.org>, Julian Reschke <julian.reschke@gmx.de>, Marc Blanchet <Marc.Blanchet@viagenie.ca>, Sam Ruby <rubys@intertwingly.net>, Paul Cotton <Paul.Cotton@microsoft.com>, Michel SUIGNARD <Michel@suignard.com>, public-html <public-html@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
I asked a co-worker to test Safari 4 on Mac. It returns Unicode in pathname, but it returns the wire format in hostname and search. Also, Firefox, Chrome and Safari include the initial slash (/) in pathname, while IE and Opera omit it. Erik On Fri, Apr 9, 2010 at 6:53 AM, Erik van der Poel <erikv@google.com> wrote: > I think we need to move the relative resolution from Issue 2 to Issue > 1 because the major browsers return the resolved path in the DOM > pathname. > > The browsers convert HTML documents into Unicode. Some of the browsers > then return that Unicode in DOM APIs, while others return the "wire" > format, depending on the URL component. The following results are from > a test case with <a href="...">. > > IE8 returns Unicode in all of the major DOM APIs (hostname, pathname, search). > > Firefox 3.6 and Opera 10 return Unicode in hostname, but they return > the wire format in pathname and search (%-encoded UTF-8 and %-encoded > original encoding, respectively). > > Chrome 4 returns the wire format in hostname, pathname and search. The > wire format for hostname is Punycode. > > If we decide that the spec should say that hostname, pathname and > search must return Unicode, then Issue 2 would be for specifying the > wire format (Punycode in host, %-encoded UTF-8 (or original) in path, > and %-encoded original in query). > > Erik > > On Fri, Apr 9, 2010 at 2:10 AM, Ian Hickson <ian@hixie.ch> wrote: >> On Fri, 9 Apr 2010, "Martin J. Dürst" wrote: >>> > >>> > Issue 1: >>> > ======================================================================== >>> > Update the IRI specification to define an algorithm with the following >>> > characteristics: >>> >>> In order to make it easier to understand this for people who are not deeply >>> involved in the HTML5 effort, I'd like to confirm that this is the algorithm >>> that HTML5 uses to split an URI/IRI into various components, each of which is >>> then accessible via a (Javascript) DOM API function. So I guess the title of >>> our issue should be something like: >>> "Ensure that the IRI spec defines how to split an IRI into components in a way >>> that's referencable by the HTML5 spec" or some such. >> >> Right. >> >> >>> > Exactly what this algorithm must do is a matter that will need careful >>> > research, reverse-engineering existing UAs. >>> >>> My understanding was that a lot of this research had already been done, >>> and that we would basically try to match whatever was in the HTML5 spec >>> before Dan Connolly and Michael Sperberg-McQueen extracted it into a >>> separate draft. Of course, we should always be open to new information >>> coming up, but your sentence above sounds much more like we have to >>> start anew. Can you clarify? >> >> Since the text was written, so many problems have been shown to exist with >> the existing text that frankly I think it would be significantly less work >> to just start over and reverse-engineer the algorithms from scratch than >> to try to first attempt to match what HTML5 used to say and then verify it >> for correctness. >> >> (Personally, if the working groups were to decide that HTML5 is where >> these algorithms should be, I'd probably just throw out the old text and >> start again from nothing, working closely with the relevant engineers at >> the various major browser vendors to check what they consider important >> and what don't, trying to reconcile the various behaviours with each >> other, with legacy content requiremnts, and with the intent of the URI and >> IRI specs. I almost certainly wouldn't start from the old algorithms.) >> >> >>> > Issue 2: >>> > ======================================================================== >>> > Update the IRI specification to define an algorithm with the following >>> > characteristics: >>> >>> Again to clarify here, if I understand correctly, the HTML5 spec needs >>> such an algorithm to resolve relative references with respect to a base >>> URI >> >> Right. This algorithm is used for resolving URLs relative to a base URL, >> and also to convert URLs into a more canonical (if not always valid) form. >> >> >>> (my wild guess is that B is the base, and A is the relative URI below, >>> can you confirm)? >> >> Right. Of course, A need not be relative, it could be itself an absolute >> URL, or it could be something unparseable. >> >> HTH, >> -- >> Ian Hickson U+1047E )\._.,--....,'``. fL >> http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. >> Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' >
Received on Friday, 9 April 2010 16:23:54 UTC