W3C home > Mailing lists > Public > public-iri@w3.org > September 2010

Re: Progress on URL spec

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 04 Sep 2010 21:42:34 +0200
To: Adam Barth <ietf@adambarth.com>
Cc: public-iri@w3.org, Peter Saint-Andre <stpeter@stpeter.im>
Message-ID: <cu25865l1qrp46d11ff0bbppt5o9rsvfn0@hive.bjoern.hoehrmann.de>
* Adam Barth wrote:
>Peter is encouraging me to coordinate my URL work with this working
>group.  I'm a bit skeptical, but I'm willing to give it a try.
>Currently, the document I'm editing is available on github.  If
>coordination with this working group seems to be going well, I'll move
>it to an Internet-Draft.

>The way browsers process URLs is largely constrained by compatibility
>with existing web content.  You might find some of the things they do
>gross and disgusting, but editorializing about the relative merits of
>that behavior is not particularly helpful at this time.

Editorializing your thoughts on this working group and other people
editorializing is perhaps not the best approach if your goal is less
editorializing -- as most people find it difficult to resist trolls.

>If you believe the document is inaccurate, your feedback will be more
>influential if you provide an example URL and an example browser which
>you believe behaves differently than what the document describes.

The document does not describe behavior that could be observed through
black box testing, so what you ask is not possible. You should define
the testing methodology so reviewers would have a reference, and more
importantly, what exactly the input to your algorithm is and how it is
obtained. For instance, the first step in your algorithm is:

  Consume all leading and trailing control characters.

That does not work for the values of attributes in HTML documents as
they may contain strings that represent relative resource identifiers.
So perhaps you are assuming absolute identifiers? The next steps are:

  If the remaining string does not contain a ":" character:
    -> The URL is invalid.
    -> Abort these steps.
  
Well that would make no sense if you assume an absolute identifier:
they contain a colon by definition. This could be meant as a test for
relative references, but then the next step is:

  Consume characters up to, but not including, the first ":"
  character. These characters are the /scheme/.

This would leave, say, "#:" as absolute reference with a scheme of
"#", as it contains a colon and "#" is the part before the first ":"
(similarily, ":" would be one with the empty string as scheme).

>At this point, I'm not accepting editorial feedback on this document.
>There's a mountain of editorial work to do, but I'd like to get the
>nuts and bolts down first.  In particular, discussion of whether to
>present the requirements in terms of an algorithm or a set of
>declarative rules is not particularly helpful at this time.

I can understand that you do not wish to receive feedback for saying
"Replace backslashes by slashes, split into components as defined in
RFC 3986 Appendix B, and if the authority contains more than one '@'
treat all but the last ones as if they had been percent-encoded" in
more than a hundred lines of prose algorithms that don't appear to be
particularily correct.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Saturday, 4 September 2010 19:43:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:59 GMT