Fwd: URL parsing in HTML5

This e-mail from Peter Saint-Andre to the public-iri mailing list may be of 
interest to the TAG.

Noah

-------- Original Message --------
Subject: URL parsing in HTML5
Resent-Date: Fri, 04 Nov 2011 04:22:35 +0000
Resent-From: public-iri@w3.org
Date: Thu, 03 Nov 2011 21:21:50 -0700
From: Peter Saint-Andre <stpeter@stpeter.im>
To: public-iri@w3.org <public-iri@w3.org>, public-html-comments@w3.org
CC: Sam Ruby <rubys@intertwingly.net>,  "Paul Cotton 
(pcotton@microsoft.com)" <pcotton@microsoft.com>, Ian Hickson 
<ian@hixie.ch>, "Michael(tm) Smith" <mike@w3.org>,  Adam Barth 
<ietf@adambarth.com>, Edward O'Connor <ted@oconnor.cx>

After chatting during TPAC 2011 with Addison, Larry, Richard, Ian, Mike,
Ted, Julian (etc.), I'd like to share some thoughts about a possible
compromise / resolution regarding Issue 56 in the HTML WG:

http://www.w3.org/html/wg/tracker/issues/56

Some observations and opinions:

1. It is unlikely that existing browsers will change their current URL
parsing behavior. (I am not judging whether that behavior is good or bad.)

2. Documentation of that behavior is out of scope for the revisions to
RFC 3987, and outside the charter of the IRI WG, because it's a matter
of URI [pre-]processing (RFC 3986) and not IRI processing (RFC 3987).

3. It is unlikely that RFC 3986 will ever be modified to recommend the
current behavior, and simply impossible before HTML5 is advanced at the
W3C (even if such modifications were desirable).

4. As far as I can see, the current behavior is in fact out of scope for
RFC 3986 and any future possible revisions to RFC 3986 because:

    (a) it is mostly or completely a matter of pre-processing of strings
    that look like URIs/URLs/"web-addresses" -- we could call these
    "candidate strings" or "proto-URLs" or somesuch to disambiguate them
    from URIs

    (b) this pre-processing behavior is applied only in the web context
    by browsers and software applications that want to be consistent
    with browsers

    (c) because of (b), there is no great danger that this behavior will
    "leak" into processing of URIs in general (mailto:, sip:, tel:,
    URNs, and so on)

5. There's no necessity for work on documentation of the current URL
parsing behavior to happen at the IETF, given that it's out of scope for
the IRI WG. Although this work could be done as an individual (non-WG)
I-D at the IETF, I think it could more easily be done at the W3C, either
as part of the HTML specification or as a separate document (the latter
might be preferable so that it can be reviewed in a more focused manner
and referenced more easily by other W3C specifications, but naturally I
would leave such decisions up to folks at the W3C). [The IRI WG is still
responsible for rfc3987bis, but that's off-topic for this email message.]

If folks can agree on the foregoing points, then I think it would be
productive to work on proposed revisions to the current text (or at
least what I believe is the current text):

http://www.w3.org/TR/html5/Overview.html#parsing-urls

I would be happy to make concrete suggestions during that revision
process if someone from the W3C could point to the preferred venue or
process (e.g., wiki page or bugzilla comments).

I look forward to discussing this further tomorrow morning during the
HTML WG session:

http://lists.w3.org/Archives/Public/public-html/2011Nov/0013.html

Peter

--
Peter Saint-Andre
https://stpeter.im/

Received on Friday, 4 November 2011 04:53:04 UTC