Change proposal for ISSUE-56 from Adam Barth on 2010-06-16 (public-html@w3.org from June 2010)

From: Adam Barth <w3c@adambarth.com>
Date: Tue, 15 Jun 2010 22:40:45 -0700
To: HTML WG <public-html@w3.org>
Message-ID: <AANLkTil1FZiIxArMv2WjUcFOaQcuYATkTM8yxNhgLG4E@mail.gmail.com>
(Apologies if I've missed the deadline for submitting a Change
Proposal for this issue.  Roy only recently explained this issue to
me.)

== Summary ==

There is no need to align "URL" processing in HTML documents with the
IRI specifications because HTML documents do not contain IRIs (or URIs
for that matter).  We should restore the removed text that explained
how to translate input strings contained in text/html documents into
URIs.

== Rationale ==

ISSUE-56 was raised in error by Michael(tm) Smith based on a
misunderstanding of Roy's messages to the working group.  Roy said
that "pretending to define a new URL standard as part of HTML5 is not
acceptable ... HTML will never define the identifiers for the Web.
That would be a fundamental violation of the Web architecture."  Based
on my current understanding of the web architecture and of how a
sequence of characters in a text/html document becomes a URI, he is
correct.  However, that does not imply that we ought to remove the
"URL" processing requirements from the HTML5 specification.

In a recent message to the IRI working group [1], Roy writes:

[[

RFC 3986 defines how to parse URIs (for recipients) and provides
many rules for scheme-specific specs to define how to generate URIs
of a given scheme (for producers) within the overall constraint of
matching the URI syntax (the formal ABNF).

[...]

Please understand that browsers almost never parse URI or IRI or
anything in between.  Browsers have input strings that contain one
or more references, usually in the document encoding, and so there
is a sequence of context-specific and charset-specific and
media-type-specific processing that occurs before you even get to
the individual URI-reference or IRI-reference that are defined by
3986/3987.

Some people have proposed that most of that pre-processing be added
to the IRIbis spec, but I have seen no evidence to suggest that
such pre-processing is even remotely standardizable (it seems to
be different for every input context).  If you can demonstrate or
get agreement on a single way to preprocess an input string, or at
least a few named processes (like single-ref and multi-ref), then
that would be useful.

]]

>From this more detailed message, it appears that it is fully
appropriate for HTML5 to define an algorithm for translating input
strings containing one or more references into one or more URIs (or an
IRIs, as appropriate).  In particular, Roy expects such translations
to be context-specific, charset-specific, and (importantly)
media-type-specific.  To wit: HTML5 ought define the pre-processing
rules that are specific to the text/html media type.

To lend even more credence to this rationale, I quote from the very
same email message [2] written by Roy that Michael(tm) Smith cited in
the description of ISSUE-56.  This quote was omitted from the
description of ISSUE-56 for reasons unknown to me:

[[

I suggest that the section be removed or replaced with the
limited and specific needs for parsing href and src attribute
values such that the attribute's value string is mapped to a
URI-reference with a defined base-URI.  HTML owns that process
of extracting a valid URI-reference from an attribute's value
string.  A simple string parsing description, with associated
context-specific error-handling, is more than sufficient to
satisfy the needs of HTML5 without appearing to override an
existing standard that has recently been agreed to by all
vendors, including the few browser vendors that care about HTML5.

]]

In effect, this change proposal urges the working group to adopt Roy's
proposal: HTML5 should define how to extract a URI-reference from
strings contained in text/html documents, complete with
context-specific error handling.

For those that prefer rationales expressed in terms of objects, this
change proposal makes the following objections:

1) I object to HTML5 deferring to RFC 3987 for parsing input strings
containing one or more references because RFC 3987 does not define an
algorithm for parsing input strings containing one or more references
that takes into account the context-specific, charset-specific, and
media-type-specific rules required by user agents to interoperably
parse such input strings in text/html documents.

2) I object to HTML5 being blocked in the IRIbis working group for
defining an algorithm for extracting URI-references from strings
contained in text/html documents for two reasons:
  a) Defining such an algorithm is out of scope for that working
group's charter [3] because these strings are not IRIs and therefore
are not subject to the requirements contained in RFC 3987.
  b) The IRIbis working group has made essentially no technical
progress since its inception.  To wit: the working group has published
only a -00 version of a single Internet-Draft.  In contrast to Larry's
claim in his change proposal, the mailing list is essentially dead:
    i) There have been only two message in June.
    ii) The messages in May consisted (essentially) of a discussion of
how to render BIDI URIs on billboards.
    iii) The messages in April consisted of coordinating with this
working group.

3) I (strongly) object to HTML5 not defining how to interoperably
process a hyperlink because a hyperlink is the essential feature of a
*hypertext* markup language.

== Proposal Details ==

The proposal details herein takes the form of a set of edit
instructions, specific enough that they can be applied without
ambiguity.

1) Restore the removed text regarding translating input strings
containing one or more reference into one or more URIs.
2) Update the surrounding text to distinguish between these input
strings and the URIs to which they are translated.

== Impact ==

1) Positive effects: User agents will be able to implement
interoperable error handling for translating strings in HTML documents
into URIs.
2) Negative effects: Readers of the HTML5 specification will need to
learn the difference between these input strings and the URIs they
represent.

Q: What conformance classes will have to change?
A: User agents.

Q: What are the risks?
A: We might actually be able to process hyperlinks interoperably,
leading to joy and happiness.  With so much joy in the work, purveyors
of whisky might go out of business.

[1] http://lists.w3.org/Archives/Public/public-iri/2010May/0008.html
[2] http://lists.w3.org/Archives/Public/public-html/2008Jun/0435.html
[3] http://tools.ietf.org/wg/iri/charters
Received on Wednesday, 16 June 2010 05:41:35 UTC