- From: Adam Barth <w3c@adambarth.com>
- Date: Wed, 14 Jul 2010 18:12:15 -0700
- To: Maciej Stachowiak <mjs@apple.com>
- Cc: HTML WG <public-html@w3.org>, Sam Ruby <rubys@intertwingly.net>
Here is the updated text of my change proposal. Hopefully the updated proposal is sufficiently specific about the text it proposes restoring. == Summary == There is no need to align "URL" processing in HTML documents with the IRI specifications because HTML documents do not contain IRIs (or URIs for that matter). We should restore the removed text that explained how to translate input strings contained in text/html documents into URIs. == Rationale == ISSUE-56 was raised in error by Michael(tm) Smith based on a message Roy sent to the working group. Roy said that "pretending to define a new URL standard as part of HTML5 is not acceptable ... HTML will never define the identifiers for the Web. That would be a fundamental violation of the Web architecture." Based on my current understanding of the web architecture and of how a sequence of characters in a text/html document becomes a URI, he is correct. However, that does not imply that we ought to remove the "URL" processing requirements from the HTML5 specification. In a recent message to the IRI working group [1], Roy writes: [[ RFC 3986 defines how to parse URIs (for recipients) and provides many rules for scheme-specific specs to define how to generate URIs of a given scheme (for producers) within the overall constraint of matching the URI syntax (the formal ABNF). [...] Please understand that browsers almost never parse URI or IRI or anything in between. Browsers have input strings that contain one or more references, usually in the document encoding, and so there is a sequence of context-specific and charset-specific and media-type-specific processing that occurs before you even get to the individual URI-reference or IRI-reference that are defined by 3986/3987. Some people have proposed that most of that pre-processing be added to the IRIbis spec, but I have seen no evidence to suggest that such pre-processing is even remotely standardizable (it seems to be different for every input context). If you can demonstrate or get agreement on a single way to preprocess an input string, or at least a few named processes (like single-ref and multi-ref), then that would be useful. ]] >From this more detailed message, it appears that it is fully appropriate for HTML5 to define an algorithm for translating input strings containing one or more references into one or more URIs (or an IRIs, as appropriate). In particular, Roy expects such translations to be context-specific, charset-specific, and (importantly) media-type-specific. To wit: HTML5 ought define the pre-processing rules that are specific to the text/html media type. To lend even more credence to this rationale, I quote from the very same email message [2] written by Roy that Michael(tm) Smith cited in the description of ISSUE-56. This quote was omitted from the description of ISSUE-56 for reasons unknown to me and to Michael(tm) Smith: [[ I suggest that the section be removed or replaced with the limited and specific needs for parsing href and src attribute values such that the attribute's value string is mapped to a URI-reference with a defined base-URI. HTML owns that process of extracting a valid URI-reference from an attribute's value string. A simple string parsing description, with associated context-specific error-handling, is more than sufficient to satisfy the needs of HTML5 without appearing to override an existing standard that has recently been agreed to by all vendors, including the few browser vendors that care about HTML5. ]] In effect, this change proposal urges the working group to adopt Roy's proposal: HTML5 should define how to extract a URI-reference from strings contained in text/html documents, complete with context-specific error handling. For those that prefer rationales expressed in terms of objects, this change proposal makes the following objections: 1) I object to HTML5 deferring to RFC 3987 for parsing input strings containing one or more references because RFC 3987 does not define an algorithm for parsing input strings containing one or more references that takes into account the context-specific, charset-specific, and media-type-specific rules required by user agents to interoperably parse such input strings in text/html documents. 2) I object to HTML5 being blocked in the IRIbis working group for defining an algorithm for extracting URI-references from strings contained in text/html documents for two reasons: a) Defining such an algorithm is out of scope for that working group's charter [3] because these strings are not IRIs and therefore are not subject to the requirements contained in RFC 3987. b) The IRIbis working group has made essentially no technical progress since its inception. To wit: the working group has published only a -00 version of a single Internet-Draft. In contrast to Larry's claim in his change proposal, the mailing list is essentially dead: i) There have been only two message in June. ii) The messages in May consisted (essentially) of a discussion of how to render BIDI URIs on billboards. iii) The messages in April consisted of coordinating with this working group. 3) I (strongly) object to HTML5 not defining how to interoperably process a hyperlink because a hyperlink is the essential feature of a *hypertext* markup language. == Proposal Details == The proposal details herein takes the form of a set of edit instructions, specific enough that they can be applied without ambiguity: 1) Revert http://svn.whatwg.org/webapps@3245. (Note: the editor and the working group should feel free to continue to improve this text after adopting this change proposal.) == Impact == 1) Positive effects: User agents will be able to implement interoperable error handling for translating strings in HTML documents into URIs. 2) Negative effects: Readers of the HTML5 specification will need to learn the difference between these input strings and the URIs they represent. Q: What conformance classes will have to change? A: User agents. Q: What are the risks? A: We might actually be able to process hyperlinks interoperably, leading to joy and happiness. With so much joy in the work, purveyors of whisky might go out of business. [1] http://lists.w3.org/Archives/Public/public-iri/2010May/0008.html [2] http://lists.w3.org/Archives/Public/public-html/2008Jun/0435.html [3] http://tools.ietf.org/wg/iri/charters
Received on Thursday, 15 July 2010 01:13:16 UTC