- From: Larry Masinter <LMM@acm.org>
- Date: Thu, 25 Feb 2010 18:45:10 -0800
- To: "'Maciej Stachowiak'" <mjs@apple.com>
- CC: <public-html@w3.org>, "'Ted Hardie'" <ted.ietf@gmail.com>
- Message-ID: <000301cab68d$b6aa79b0$23ff6d10$@org>
With regard to ISSUE-56, ACTION-171: Rationale: The Issue this proposal is trying to address is: "Bring URLs section/definition and IRI specification in alignment." (1) The fundamental rationale is that URLs in HTML and similar identifiers in other Internet systems need to have the same syntax and semantics. The advantages of doing this in technical specifications include all of those articulated for modular specifications. (2) The IETF has approved an IRI working group whose charter specifically includes working with the W3C HTML working group: as noted in: http://lists.w3.org/Archives/Public/public-html/2010Feb/0476.html and http://tools.ietf.org/wg/iri/charters which includes: " The IRI specification(s) must (continue to) be suitable for normative reference with Web and XML standards from W3C specifications. The group should coordinate with the W3C working groups on HTML5, XML Core, and Internationalization, as well as with IETF HTTPBIS WG to ensure acceptability. " Evidence that there is interest outside of the W3C HTML working group current members to contribute to this work has been the extensive participation and time spent already in meetings, including: * meetings at the last W3C TPAC * Two working group development sessions at IETF meetings with significant participation by non-HTML-WG members http://www.alvestrand.no/pipermail/idna-update/2009-October/005720.htm l http://lists.w3.org/Archives/Public/public-iri/2009Nov/0040.html http://www.alvestrand.no/pipermail/idna-update/2009-July/004598.html * Interest in, and discussions with, members of the Unicode Consortium Technical Committee. In addition, there is evidence that this work can succeed: the discussion in the mailing list for the IRI working group http://lists.w3.org/Archives/Public/public-iri/ is active; most of the recent active contributions have been by W3C HTML Working Group members, with additional contributions from the broader community of Internet application development. The first F2F meeting of the IRI working group in IETF will be Friday, March 25, but of course, as with all IETF working groups, the primary work of the group is on the mailing list, and there is no cost or fee for participation there. (3) Recent public-iri discussion seems to raise the issue that the current definition of URLs in the existing HTML5 specification may not match implementations in any case. The analysis of how currently deployed systems work, and how they should work in the face of changes to the Internationalization of Domain Names, should be done in a context where the affected communities (IDN, Unicode Technical Committee, HTML WG, etc.) can come to agreement. (4) Additional information in the HTML5 bug report http://www.w3.org/Bugs/Public/show_bug.cgi?id=8207 indicate that the reason for rejecting this as a "bug" is that the IRI document is 'vague' and does not contain sufficient normative language to satisfy some who believe that MUST language with normative algorithms is necessary. However, these requirements should be handled as updates to the IRI specification, so that the HTML5 specification not contain divergent implementation advice from that used by every other application that uses URLs/IRIs. (5) While there may be additional adjustments necessary to align the boundary between what the HTML5 document and the IRIBIS document, this work should proceed as bugs on the drafts, as amended by this change proposal. =============================================================== Proposal: The actual proposal itself was available as an attachment to http://lists.w3.org/Archives/Public/public-html/2009Nov/0670.html http://lists.w3.org/Archives/Public/public-html/2009Nov/att-0670/iri-r ewrite-draft.html A minor update of that proposal (edited to update the reference to point to the IETF document) is attached to this message and also made available in plain text here: ================================================================ NOTE: This is a draft of one way of rewriting section 2.5.1 of The HTML 5 editor's draft of 25 August 2009, provided as an example. 2.5.1 Terminology Historically the term "URI" was used for "Universal Resource Identifier" [RFC1630]; with a Uniform Resource Locator (URL) being the form of URI which expresses an address which maps onto an access algorithm using network protocols. Further technical specifications [RFC 1738], [RFC 1808], [RFC 2396] and [RFC 3986], subsequently defined a "relative URL", elaborated the distinction between Uniform Resource Names (URN) and URLs, and led to the adoption of "URI" as Uniform Resource Identifier, and introduced the notion of an "Internationalized Resource Identifier" (IRI) [RFC 3987] as a syntactic form which allowed (unencoded) non-ASCII Unicode characters. [HTML 4.01] (from which this specification was evolved) used "URI" as specified by [RFC 2396], but contained recommended processing rules for HTML agents (in [HTML 4.01] appendix B.2) for handling invalid values containing non-ASCII characters, roughly corresponding to the guidance in [RFC 3987]. Popular informal usage continues to use "URL" to refer to any of these variations, although, for the most part, the term "URL" alone indicates an "absolute" form including a scheme (see below). Definition: In this document, the term "URL" is used for any strings used to identify a resource, including relative forms; the distinction between various forms are made in context or with qualifiers or by processing rules, as to whether the URL corresponds to a URI or a "relative reference" (as specified in [RFC 3986]) or the "internationalized" forms of those, IRI and relative IRI reference (as specified in [draft-ietf-iri-3987bis]), or to strings which (after preprocessed by the rules defined in Section 7.2 of [draft-ietf-iri-3987bis]) result in one of those forms. Definition: a valid URL is a string that matches the production of "iri-reference" in[draft-ietf-iri-3987bis]. Definition: a valid absolute URL is a string that matches the production of "IRI" in [draft-ietf-iri-3987bis]. Definition: an absolute URL is a string which results in a valid absolute URL (defined above) after being processed by the rules of "Web Address Processing" in section 7.2 of [draft-ietf-iri-3987bis]. Note that this basically means any string which, after preprocessing, starts with an initial string matching the "scheme" production of [draft-ietf-iri-3987bis], followed by a colon. Definition: A relative URL is a URL that is not an absolute URL; similarly, a valid relative URL is a valid URL that is not an absolute URL. Definition: To parse a URL into its component parts means to first preprocess the string according to section 7.2 of [draft-ietf-iri-3987bis] "Web Address Processing", and then to parse the results of preprocessing (as per section 3.2 of [draft-ietf-iri-3987bis]) against the "iri-reference" (if parsing a URL) or the "IRI" production (if parsing an absolute URL). Note that the preprocessing steps generally result in a valid URL or a valid relative URL. Matching BNF components results in the following parts: * <scheme>: substring that matched "scheme", if any * <host>: substring that matched "ireg-name", if any * <port>: substring that matches "port", if any * <hostport>: if there is a scheme component and a port component and the port given by the port component is different than the default port defined the scheme component (if the default port for the scheme is known), then <hostport> is the substring that starts with the substring matched by the host production and ends with the substring matched by the port production, and includes the colon in between the two. Otherwise, it is the same as the host component. * <path>: substring that matches "ipath" , if any * <query>: substring that matches "iquery", if any * <fragment>: substring that matches "ifragment", if any * <host-specific>: the substring that follows the substring matched by the "iauthority" production, or the whole string (that is, the input to the matching algorithm which is the result of preprocessing by section 7.2) if the "iauthority" production wasn't matched. Definition: The phrasing resolve.. relative to... (in the context of resolve a URL relative to another URL) is used to describe the process of combining two strings: an original URL and a base URL (usually an absolute URL) to obtain parsed components; these parsed components may then be recombined to construct a new URL. This is accomplished by parsing the original and base URLs (preprocessing by section 7.2 of [draft-ietf-iri-3987bis] first, then matching against the productions of section 3.2 of [draft-ietf-iri-3987bis]) but then combining the original and base components following the algorithms in section 5.2 of [RFC 3986], but applied to the Unicode characters which constitute the original and base. Definition: the document base URL of a Document object is the absolute URL defined by : 1. Let fallback base url be the document's address (an absolute URL). 2. If fallback base url is the string about:blank and the Document's browsing context has a creator browsing context, then let fallback base url be the document base URL of the creator Document instead. 3. If there is no base element that is both a child of the head element and has a href attribute, then the document base URL is fallback base url. 4. Otherwise, the document base URL url is the result of resolving the href attribute of the first such element relative to fallback base url(note that the base href attribute isn't affected by xml:base attributes).
Attachments
- text/html attachment: iri-rewrite-draft.html
Received on Friday, 26 February 2010 02:46:05 UTC