Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012) from Roy T. Fielding on 2012-10-24 (uri@w3.org from October 2012)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 24 Oct 2012 12:10:59 -0700
To: Brian E Carpenter <brian.e.carpenter@gmail.com>
Cc: Mark Nottingham <mnot@mnot.net>, Ian Hickson <ian@hixie.ch>, Tim Bray <tbray@textuality.com>, Jan Algermissen <jan.algermissen@nordsc.com>, Julian Reschke <julian.reschke@gmx.de>, Noah Mendelsohn <nrm@arcanedomain.com>, URI <uri@w3.org>, IETF Discussion <ietf@ietf.org>
Message-Id: <097534B5-8135-4A20-8275-934A5547ACDB@gbiv.com>

On Oct 24, 2012, at 3:39 AM, Brian E Carpenter wrote:
> On 23/10/2012 00:32, Mark Nottingham wrote:
> ...
>> The underlying point that people seem to be making is that there's legitimate need for URIs to be a separate concept from "strings that will become URIs." By collapsing them into one thing, you're doing those folks a disservice. Browser implementers may not care, but it's pretty obvious that lots of other people do.
> 
> Thanks for bringing this point out. It was explained to me in 1993 by TBL and
> Robert Cailliau that URLs (the only term used then, I think)

As a historical footnote, the term URL was created by the same
BOF that created the Uniform Resource Identifiers working group
at the IETF meeting in July 1992.

The early Web protocol specs had used the term "network address".

The term "Document Identifiers" came from Brewster Kahle and was
later used in a call for proposals by the Coalition for Networked
Information's Architectures & Standards Working Group, which in
turn led to TimBL propose Web addresses as Universal Document
Identifiers for a BOF at IETF 24 (Cambridge, MA).  Somewhere
in that BOF discussion, the URI working group was proposed and
TimBL's proposal was renamed Uniform Resource Locators
to distinguish it from other ideas for URNs
[see IETF 24 proceedings, p.184, and the following link].

 ftp://ftp.ietf.org/ietf/92jul/udi-minutes-92jul.txt

TimBL had originally specified that addresses in HREF could be
provided in full or partial form.  The IETF removed the partial
form, leading to all sorts of bad decisions regarding syntax,
and so I revived it in 1994 as Relative URLs [RFC1808].  That
spec is the only one that came close to defining what Anne
is trying to do here -- a single parsing standard for
potentially relative references. 

It is easy to claim that the merging of syntax specs that
created RFC2396 lost some value when the parsing standard was
replaced by a non-normative appendix.  However, it was discussed
extensively at the time, including with the browser developers,
and there was simply nothing common enough to make standard.
The best I could do for 2396 and 3986 was to include a
regular expression that accepts all strings and parses them
into the component parts.

I have absolutely no problem with writing a proposed standard
for parsing references, particularly if browser developers are
willing to adhere to one.  However, it is not a redefinition of
URLs, nor does it make sense for error-correcting transformations
(like pct-encoding embedded spaces) to be "the standard" for
parsing when there are plenty of applications that string
parse references for the sake of generating invalid test cases
(e.g., the example attributed to curl).

It is not non-interoperable behavior to parse input data
differently depending on the context in which it is entered.
What matters is that the context be properly documented to
indicate what pre/post-processing is applied, just as we
expect a browser's combined search/location dialog bars to
be documented as not merely URL-entry forms (or be banned
due to the privacy leakage of incremental search results).

....Roy

Received on Wednesday, 24 October 2012 19:11:26 UTC