- From: Maciej Stachowiak <mjs@apple.com>
- Date: Wed, 21 Jul 2010 15:20:52 -0700
- To: Adam Barth <w3c@adambarth.com>
- Cc: HTML WG <public-html@w3.org>, Sam Ruby <rubys@intertwingly.net>
Thanks for the update. I've recorded this Change Proposal on the issue status page: http://dev.w3.org/html5/status/issue-status.html#ISSUE-056 Regards, Maciej On Jul 14, 2010, at 6:12 PM, Adam Barth wrote: > Here is the updated text of my change proposal. Hopefully the updated > proposal is sufficiently specific about the text it proposes > restoring. > > == Summary == > > There is no need to align "URL" processing in HTML documents with the > IRI specifications because HTML documents do not contain IRIs (or URIs > for that matter). We should restore the removed text that explained > how to translate input strings contained in text/html documents into > URIs. > > == Rationale == > > ISSUE-56 was raised in error by Michael(tm) Smith based on a message > Roy sent to the working group. Roy said that "pretending to define a > new URL standard as part of HTML5 is not acceptable ... HTML will > never define the identifiers for the Web. That would be a fundamental > violation of the Web architecture." Based on my current understanding > of the web architecture and of how a sequence of characters in a > text/html document becomes a URI, he is correct. However, that does > not imply that we ought to remove the "URL" processing requirements > from the HTML5 specification. > > In a recent message to the IRI working group [1], Roy writes: > > [[ > RFC 3986 defines how to parse URIs (for recipients) and provides many > rules for scheme-specific specs to define how to generate URIs of a > given scheme (for producers) within the overall constraint of matching > the URI syntax (the formal ABNF). > > [...] > > Please understand that browsers almost never parse URI or IRI or > anything in between. Browsers have input strings that contain one or > more references, usually in the document encoding, and so there is a > sequence of context-specific and charset-specific and > media-type-specific processing that occurs before you even get to the > individual URI-reference or IRI-reference that are defined by > 3986/3987. > > Some people have proposed that most of that pre-processing be added to > the IRIbis spec, but I have seen no evidence to suggest that such > pre-processing is even remotely standardizable (it seems to be > different for every input context). If you can demonstrate or get > agreement on a single way to preprocess an input string, or at least a > few named processes (like single-ref and multi-ref), then that would > be useful. > ]] > >> From this more detailed message, it appears that it is fully > appropriate for HTML5 to define an algorithm for translating input > strings containing one or more references into one or more URIs (or an > IRIs, as appropriate). In particular, Roy expects such translations > to be context-specific, charset-specific, and (importantly) > media-type-specific. To wit: HTML5 ought define the pre-processing > rules that are specific to the text/html media type. > > To lend even more credence to this rationale, I quote from the very > same email message [2] written by Roy that Michael(tm) Smith cited in > the description of ISSUE-56. This quote was omitted from the > description of ISSUE-56 for reasons unknown to me and to Michael(tm) > Smith: > > [[ > I suggest that the section be removed or replaced with the limited and > specific needs for parsing href and src attribute values such that the > attribute's value string is mapped to a URI-reference with a defined > base-URI. HTML owns that process of extracting a valid URI-reference > from an attribute's value string. A simple string parsing > description, with associated context-specific error-handling, is more > than sufficient to satisfy the needs of HTML5 without appearing to > override an existing standard that has recently been agreed to by all > vendors, including the few browser vendors that care about HTML5. > ]] > > In effect, this change proposal urges the working group to adopt Roy's > proposal: HTML5 should define how to extract a URI-reference from > strings contained in text/html documents, complete with > context-specific error handling. > > For those that prefer rationales expressed in terms of objects, this > change proposal makes the following objections: > > 1) I object to HTML5 deferring to RFC 3987 for parsing input strings > containing one or more references because RFC 3987 does not define an > algorithm for parsing input strings containing one or more references > that takes into account the context-specific, charset-specific, and > media-type-specific rules required by user agents to interoperably > parse such input strings in text/html documents. > > 2) I object to HTML5 being blocked in the IRIbis working group for > defining an algorithm for extracting URI-references from strings > contained in text/html documents for two reasons: > a) Defining such an algorithm is out of scope for that working > group's charter [3] because these strings are not IRIs and therefore > are not subject to the requirements contained in RFC 3987. > b) The IRIbis working group has made essentially no technical > progress since its inception. To wit: the working group has published > only a -00 version of a single Internet-Draft. In contrast to Larry's > claim in his change proposal, the mailing list is essentially dead: > i) There have been only two message in June. > ii) The messages in May consisted (essentially) of a discussion of > how to render BIDI URIs on billboards. > iii) The messages in April consisted of coordinating with this > working group. > > 3) I (strongly) object to HTML5 not defining how to interoperably > process a hyperlink because a hyperlink is the essential feature of a > *hypertext* markup language. > > == Proposal Details == > > The proposal details herein takes the form of a set of edit > instructions, specific enough that they can be applied without > ambiguity: > > 1) Revert http://svn.whatwg.org/webapps@3245. (Note: the editor and > the working group should feel free to continue to improve this text > after adopting this change proposal.) > > == Impact == > > 1) Positive effects: User agents will be able to implement > interoperable error handling for translating strings in HTML documents > into URIs. > 2) Negative effects: Readers of the HTML5 specification will need to > learn the difference between these input strings and the URIs they > represent. > > Q: What conformance classes will have to change? > A: User agents. > > Q: What are the risks? > A: We might actually be able to process hyperlinks interoperably, > leading to joy and happiness. With so much joy in the work, purveyors > of whisky might go out of business. > > [1] http://lists.w3.org/Archives/Public/public-iri/2010May/0008.html > [2] http://lists.w3.org/Archives/Public/public-html/2008Jun/0435.html > [3] http://tools.ietf.org/wg/iri/charters >
Received on Wednesday, 21 July 2010 22:21:25 UTC