Re: Change proposal for ISSUE-56 from Maciej Stachowiak on 2010-07-21 (public-html@w3.org from July 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Wed, 21 Jul 2010 15:20:52 -0700
To: Adam Barth <w3c@adambarth.com>
Cc: HTML WG <public-html@w3.org>, Sam Ruby <rubys@intertwingly.net>
Message-id: <285333CC-7344-4EEB-B5BA-C2AED6E8AC83@apple.com>
Thanks for the update. I've recorded this Change Proposal on the issue status page:

http://dev.w3.org/html5/status/issue-status.html#ISSUE-056

Regards,
Maciej

On Jul 14, 2010, at 6:12 PM, Adam Barth wrote:

> Here is the updated text of my change proposal.  Hopefully the updated
> proposal is sufficiently specific about the text it proposes
> restoring.
> 
> == Summary ==
> 
> There is no need to align "URL" processing in HTML documents with the
> IRI specifications because HTML documents do not contain IRIs (or URIs
> for that matter).  We should restore the removed text that explained
> how to translate input strings contained in text/html documents into
> URIs.
> 
> == Rationale ==
> 
> ISSUE-56 was raised in error by Michael(tm) Smith based on a message
> Roy sent to the working group.  Roy said that "pretending to define a
> new URL standard as part of HTML5 is not acceptable ... HTML will
> never define the identifiers for the Web. That would be a fundamental
> violation of the Web architecture."  Based on my current understanding
> of the web architecture and of how a sequence of characters in a
> text/html document becomes a URI, he is correct.  However, that does
> not imply that we ought to remove the "URL" processing requirements
> from the HTML5 specification.
> 
> In a recent message to the IRI working group [1], Roy writes:
> 
> [[
> RFC 3986 defines how to parse URIs (for recipients) and provides many
> rules for scheme-specific specs to define how to generate URIs of a
> given scheme (for producers) within the overall constraint of matching
> the URI syntax (the formal ABNF).
> 
> [...]
> 
> Please understand that browsers almost never parse URI or IRI or
> anything in between.  Browsers have input strings that contain one or
> more references, usually in the document encoding, and so there is a
> sequence of context-specific and charset-specific and
> media-type-specific processing that occurs before you even get to the
> individual URI-reference or IRI-reference that are defined by
> 3986/3987.
> 
> Some people have proposed that most of that pre-processing be added to
> the IRIbis spec, but I have seen no evidence to suggest that such
> pre-processing is even remotely standardizable (it seems to be
> different for every input context).  If you can demonstrate or get
> agreement on a single way to preprocess an input string, or at least a
> few named processes (like single-ref and multi-ref), then that would
> be useful.
> ]]
> 
>> From this more detailed message, it appears that it is fully
> appropriate for HTML5 to define an algorithm for translating input
> strings containing one or more references into one or more URIs (or an
> IRIs, as appropriate).  In particular, Roy expects such translations
> to be context-specific, charset-specific, and (importantly)
> media-type-specific.  To wit: HTML5 ought define the pre-processing
> rules that are specific to the text/html media type.
> 
> To lend even more credence to this rationale, I quote from the very
> same email message [2] written by Roy that Michael(tm) Smith cited in
> the description of ISSUE-56.  This quote was omitted from the
> description of ISSUE-56 for reasons unknown to me and to Michael(tm)
> Smith:
> 
> [[
> I suggest that the section be removed or replaced with the limited and
> specific needs for parsing href and src attribute values such that the
> attribute's value string is mapped to a URI-reference with a defined
> base-URI.  HTML owns that process of extracting a valid URI-reference
> from an attribute's value string.  A simple string parsing
> description, with associated context-specific error-handling, is more
> than sufficient to satisfy the needs of HTML5 without appearing to
> override an existing standard that has recently been agreed to by all
> vendors, including the few browser vendors that care about HTML5.
> ]]
> 
> In effect, this change proposal urges the working group to adopt Roy's
> proposal: HTML5 should define how to extract a URI-reference from
> strings contained in text/html documents, complete with
> context-specific error handling.
> 
> For those that prefer rationales expressed in terms of objects, this
> change proposal makes the following objections:
> 
> 1) I object to HTML5 deferring to RFC 3987 for parsing input strings
> containing one or more references because RFC 3987 does not define an
> algorithm for parsing input strings containing one or more references
> that takes into account the context-specific, charset-specific, and
> media-type-specific rules required by user agents to interoperably
> parse such input strings in text/html documents.
> 
> 2) I object to HTML5 being blocked in the IRIbis working group for
> defining an algorithm for extracting URI-references from strings
> contained in text/html documents for two reasons:
>  a) Defining such an algorithm is out of scope for that working
> group's charter [3] because these strings are not IRIs and therefore
> are not subject to the requirements contained in RFC 3987.
>  b) The IRIbis working group has made essentially no technical
> progress since its inception.  To wit: the working group has published
> only a -00 version of a single Internet-Draft.  In contrast to Larry's
> claim in his change proposal, the mailing list is essentially dead:
>    i) There have been only two message in June.
>    ii) The messages in May consisted (essentially) of a discussion of
> how to render BIDI URIs on billboards.
>    iii) The messages in April consisted of coordinating with this
> working group.
> 
> 3) I (strongly) object to HTML5 not defining how to interoperably
> process a hyperlink because a hyperlink is the essential feature of a
> *hypertext* markup language.
> 
> == Proposal Details ==
> 
> The proposal details herein takes the form of a set of edit
> instructions, specific enough that they can be applied without
> ambiguity:
> 
> 1) Revert http://svn.whatwg.org/webapps@3245.  (Note: the editor and
> the working group should feel free to continue to improve this text
> after adopting this change proposal.)
> 
> == Impact ==
> 
> 1) Positive effects: User agents will be able to implement
> interoperable error handling for translating strings in HTML documents
> into URIs.
> 2) Negative effects: Readers of the HTML5 specification will need to
> learn the difference between these input strings and the URIs they
> represent.
> 
> Q: What conformance classes will have to change?
> A: User agents.
> 
> Q: What are the risks?
> A: We might actually be able to process hyperlinks interoperably,
> leading to joy and happiness.  With so much joy in the work, purveyors
> of whisky might go out of business.
> 
> [1] http://lists.w3.org/Archives/Public/public-iri/2010May/0008.html
> [2] http://lists.w3.org/Archives/Public/public-html/2008Jun/0435.html
> [3] http://tools.ietf.org/wg/iri/charters
>
Received on Wednesday, 21 July 2010 22:21:25 UTC