- From: Adam Barth <ietf@adambarth.com>
- Date: Sat, 4 Sep 2010 19:05:51 -0700
- To: Maciej Stachowiak <mjs@apple.com>
- Cc: public-iri@w3.org, Peter Saint-Andre <stpeter@stpeter.im>
On Sat, Sep 4, 2010 at 6:19 PM, Maciej Stachowiak <mjs@apple.com> wrote: > On Sep 3, 2010, at 9:21 PM, Adam Barth wrote: >> At the URL below, you can find a snapshot of the document. I believe >> this document accurately describes how browsers parse "hierarchal" >> URLs, such as those with the http, https, and ftp schemes: >> >> http://github.com/abarth/url-spec/raw/830fe35e0db8db30b5bd43a24a802ab3f4eec8b6/drafts/url.txt >> >> If you believe the document is inaccurate, your feedback will be more >> influential if you provide an example URL and an example browser which >> you believe behaves differently than what the document describes. >> Also helpful are pointers to test suites that I can run on various >> browsers to learn about their behavior. > > It's hard to tell if the document is inaccurate by that standard because: > > A) "parse" of an arbitrary string is not an observable facet of the Web platform; the only "parse" operation that's actually exposed is the DOM API on the Location object and <a> elements, which implicitly operates on a string that has already been resolved and canonicalized. This algorithm seems to be for an invisible parse step that happens before URLs are resolved+canonicalized (given that it handles some invalid inputs that already would have been cleaned up by the resolve+canonicalize operation). The parse operation that is actually exposed only operates on an already-resolved URL > > As a result, I don't see how to make tests that would determine if browsers match the behavior of the algorithm. In some cases, it's clear because we be reasonably sure the browser is treating the string as an absolute URL. For example, the question of how to treat slash characters between "http:" and "example.com/" makes sense even without understanding how to resolve relative URLs. I'll look at resolving relative URLs next, though, since that's important for moving forward. > B) In cases where browsers have different behavior, there's no documentation of why one or the other was chosen. > > Perhaps it would be more fruitful to review once there is enough here to relate to an observable behavior of the Web platform. Thanks for taking a look at this early draft. I'll send another message to this list once I've gotten the relative URL resolving in reasonable shape. On Sat, Sep 4, 2010 at 6:24 PM, Maciej Stachowiak <mjs@apple.com> wrote: > On Sep 4, 2010, at 1:36 PM, Adam Barth wrote: >>> This could be meant as a test for >>> relative references, but then the next step is: >>> >>> Consume characters up to, but not including, the first ":" >>> character. These characters are the /scheme/. > > To the extent that I can relate this spec to browser behaviors, I think this step is wrong. Browsers look for a ":" that occurs before any character that can't appear in a scheme under any circumstances, and that includes "#" and "/" for example. If a ":" isn't found before hitting a non-scheme character, the URL is invalid. Thanks. This is a helpful piece of information. >>> This would leave, say, "#:" as absolute reference with a scheme of >>> "#", as it contains a colon and "#" is the part before the first ":" >>> (similarily, ":" would be one with the empty string as scheme). >> >> We have not yet defined how to resolve relative URLs. The parsing >> definition, at least so far, is a definition of how to parse absolute >> URLs. If you were asked to regard the string "#:" as an absoute URL, >> it seems like treating "#" as the scheme would be one reasonable >> interpretation. I haven't thought through canonicalization yet, but I >> suspect testing will reveal that "#" is not a valid character for a >> scheme. > > It's hard to tell if this makes sense without understanding what browser behavior would reflect this. Yeah, it's unclear to me whether the handling of this particular case is observable. Adam
Received on Sunday, 5 September 2010 02:06:51 UTC