Re: Progress on URL spec

On Sat, Sep 4, 2010 at 4:28 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> * Adam Barth wrote:
>>I've started by trying to separate the concerns of parsing absolute
>>URLs and resolving relative URLs.  We might come to find that such a
>>distinction is foolish, but it seems plausible at this time.
>
> I don't think there is anything plausible about defining how to parse
> an absolute reference that contains no colon and thus isn't absolute,
> much like it is not plausible to define that the scheme in "#:" is "#".

Plausible?  I don't understand what you mean by that term.

>>As for the parsing definition in RFC 3986 Appendix B, is this the
>>regular expression that you're referring to?
>>
>>      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
>>
>>This doesn't appear to get even simple examples correct.  For example,
>>that regular expression doesn't produce a match for the following
>>string, but browsers do, in fact, behave as if this string represents
>>a particular URL:
>>
>>http:///example.com/
>
> That's a perfectly valid reference per the generic syntax and it has a
> scheme of 'http', undefined query and fragment parts, an empty authority
> and a path of '/example.com/' as mandated by RFC 3986 and as the regular
> expression matches [1].

Unfortunately, Firefox, Chrome, and Safari interpret that string as if
it were a URL with an authority of "example.com".

> Neither IE6 nor Opera will treat the string as
> if the third slash had been omitted; if any browser does, that is a bug.

Rather, I'd say that there's an interoperability problem to solve,
which is the motivation for this work.  Now, how to resolve the
difference in behavior is an interesting question.  What matters in
resolving this question, at least to browser vendors, is what existing
content on the web expects browsers to do.  That's a question we can
answer with data, not with opinion.  Do you have data to support which
behavior, if implemented by a browser, would result in greater
compatibility with existing web content?

> That's one reason for my remark about the correctness of your algorithm.

Thanks.  If you have further examples of interesting input strings,
that's appreciated.  Blanket statements about "plausibility" are not
appreciated.

> [1] As the specification notes, the expression matches all strings

Great.  That's an important first step in defining behavior
unambiguously, which, itself, is an important step in producing
interoperable implementations.

Adam

Received on Sunday, 5 September 2010 00:02:40 UTC