W3C home > Mailing lists > Public > public-html@w3.org > March 2010

Re: HTML CHANGE PROPOSAL; change definition of URL to normative reference to IRIBIS

From: Maciej Stachowiak <mjs@apple.com>
Date: Mon, 01 Mar 2010 13:13:26 -0800
Cc: public-html@w3.org, 'Ted Hardie' <ted.ietf@gmail.com>
Message-id: <3F9716BA-BCE7-454D-A742-53C03BB75486@apple.com>
To: Larry Masinter <LMM@acm.org>

Thanks for the update. Recorded at <http://dev.w3.org/html5/status/issue-status.html#ISSUE-056 
 >.

  - Maciej


On Feb 25, 2010, at 6:45 PM, Larry Masinter wrote:

> With regard to ISSUE-56, ACTION-171:
>
> Rationale:
>
> The Issue this proposal is trying to address is:
> "Bring URLs section/definition and IRI specification in alignment."
>
> (1) The fundamental rationale is that URLs in HTML and similar
> identifiers
> in other Internet systems need to have the same syntax and semantics.
> The advantages of doing this in technical specifications include all
> of those articulated for modular specifications.
>
> (2) The IETF has approved an IRI working group whose charter
> specifically includes working with the W3C HTML working group:
> as noted in:
> http://lists.w3.org/Archives/Public/public-html/2010Feb/0476.html
> and
> http://tools.ietf.org/wg/iri/charters which includes:
>
> " The IRI specification(s) must (continue to) be suitable
>  for normative reference with Web and XML standards from W3C
>  specifications. The group should coordinate with the W3C working
>  groups on HTML5, XML Core, and Internationalization, as well
>  as with IETF HTTPBIS WG to ensure acceptability.  "
>
> Evidence that there is interest outside of the W3C HTML
> working group current members to contribute to this work
> has been the extensive participation and time spent already
> in meetings, including:
>
>  * meetings at the last W3C TPAC
>  * Two working group development sessions at IETF meetings
>    with significant participation by non-HTML-WG members
>
> http://www.alvestrand.no/pipermail/idna-update/2009-October/005720.htm
> l
>    http://lists.w3.org/Archives/Public/public-iri/2009Nov/0040.html
>
> http://www.alvestrand.no/pipermail/idna-update/2009-July/004598.html
>  * Interest in, and discussions with, members of the Unicode
>    Consortium Technical Committee.
>
> In addition, there is evidence that this work can succeed:
> the discussion in the mailing list for the IRI working group
> http://lists.w3.org/Archives/Public/public-iri/ is active;
> most of the recent active contributions have been by
> W3C HTML Working Group members, with additional contributions
> from the broader community of Internet application
> development.
>
>
> The first F2F meeting of the IRI working group in IETF
> will be Friday, March 25, but of course, as with all IETF
> working groups, the primary work of the group is on the
> mailing list, and there is no cost or fee for participation
> there.
>
> (3) Recent public-iri discussion seems to raise the issue that the
> current definition of URLs in the existing HTML5 specification
> may not match implementations in any case. The analysis of
> how currently deployed systems work, and how they should work
> in the face of changes to the Internationalization of Domain
> Names, should be done in a context where the affected communities
> (IDN, Unicode Technical Committee, HTML WG, etc.) can come
> to agreement.
>
> (4) Additional information in the HTML5 bug report
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=8207
> indicate that the reason for rejecting this as a "bug"
> is that the IRI document is 'vague' and does not contain
> sufficient normative language to satisfy some who believe
> that MUST language with normative algorithms is necessary.
> However, these requirements should be handled as updates
> to the IRI specification, so that the HTML5 specification
> not contain divergent implementation advice from that
> used by every other application that uses URLs/IRIs.
>
> (5) While there may be additional adjustments necessary
> to align the boundary between what the HTML5 document
>  and the IRIBIS document, this work should
> proceed as bugs on the drafts, as amended by this change
> proposal.
>
> ===============================================================
> Proposal:
>
> The actual proposal itself was available as an attachment to
> http://lists.w3.org/Archives/Public/public-html/2009Nov/0670.html
> http://lists.w3.org/Archives/Public/public-html/2009Nov/att-0670/iri-r
> ewrite-draft.html
>
> A minor update of that proposal (edited to update the reference
> to point to the IETF document) is attached to this message
> and also made available in plain text here:
>
>
> ================================================================
>
>
> NOTE: This is a draft of one way of rewriting section 2.5.1 of The
> HTML 5 editor's draft of 25 August 2009, provided as an example.
>
>
> 2.5.1 Terminology
>
> Historically the term "URI" was used for "Universal Resource
> Identifier" [RFC1630]; with a Uniform Resource Locator (URL) being the
> form of URI which expresses an address which maps onto an access
> algorithm using network protocols. Further technical specifications
> [RFC 1738], [RFC 1808], [RFC 2396] and [RFC 3986], subsequently
> defined a "relative URL", elaborated the distinction between Uniform
> Resource Names (URN) and URLs, and led to the adoption of "URI" as
> Uniform Resource Identifier, and introduced the notion of an
> "Internationalized Resource Identifier" (IRI) [RFC 3987] as a
> syntactic form which allowed (unencoded) non-ASCII Unicode characters.
> [HTML 4.01] (from which this specification was evolved) used "URI" as
> specified by [RFC 2396], but contained recommended processing rules
> for HTML agents (in [HTML 4.01] appendix B.2) for handling invalid
> values containing non-ASCII characters, roughly corresponding to the
> guidance in [RFC 3987].
>
> Popular informal usage continues to use "URL" to refer to any of these
> variations, although, for the most part, the term "URL" alone
> indicates an "absolute" form including a scheme (see below).
>
> Definition: In this document, the term "URL"  is used for any strings
> used to identify a resource, including  relative forms; the
> distinction between various forms are made in context or with
> qualifiers or by processing rules, as to whether the URL corresponds
> to a URI or a "relative reference" (as specified in [RFC 3986]) or the
> "internationalized" forms of those, IRI and relative IRI reference (as
> specified in [draft-ietf-iri-3987bis]), or to strings which (after
> preprocessed by the  rules defined in Section 7.2 of
> [draft-ietf-iri-3987bis]) result in one of those forms.
>
> Definition: a valid URL  is a string that matches the production of
> "iri-reference" in[draft-ietf-iri-3987bis].
>
> Definition: a valid absolute URL is a string that matches the
> production of "IRI" in [draft-ietf-iri-3987bis].
>
> Definition: an absolute URL is a string which results in a valid
> absolute URL (defined above) after being processed by the rules of
> "Web Address Processing" in section 7.2 of [draft-ietf-iri-3987bis].
> Note that this basically means any string which, after preprocessing,
> starts with an initial string matching the "scheme" production of
> [draft-ietf-iri-3987bis], followed by a colon.
>
> Definition: A relative URL is a URL that is not an absolute URL;
> similarly, a valid relative URL is a valid URL that is not an absolute
> URL.
> Definition: To parse a URL into its component parts means to first
> preprocess the string according to section 7.2 of
> [draft-ietf-iri-3987bis] "Web Address Processing", and then to parse
> the results of preprocessing (as per section 3.2 of
> [draft-ietf-iri-3987bis]) against the "iri-reference" (if parsing a
> URL)  or the "IRI" production (if parsing an absolute URL).  Note that
> the preprocessing steps generally result in a valid URL or a valid
> relative URL.  Matching BNF components results in the following parts:
>
>    * <scheme>:  substring that matched "scheme", if any
>    * <host>:  substring that matched "ireg-name", if any
>    * <port>: substring that matches "port", if any
>    * <hostport>: if there is a scheme component and a port component
> and the port given by the port component is different than the default
> port defined the scheme component (if the default port for the scheme
> is known), then  <hostport> is the substring that starts with the
> substring matched by the host production and ends with the substring
> matched by the   port production, and includes the colon in between
> the two. Otherwise, it is the same as the host component.
>    * <path>: substring that matches "ipath" , if any
>    * <query>: substring that matches "iquery", if any
>    * <fragment>:  substring that matches "ifragment", if any
>    * <host-specific>: the substring that follows the substring
> matched by the "iauthority" production, or the whole string (that is,
> the input to the matching algorithm which is the result of
> preprocessing by section 7.2) if the "iauthority" production wasn't
> matched.
>
> Definition: The phrasing resolve.. relative to... (in the context of
> resolve a URL relative to another URL)  is used to describe the
> process of combining two strings: an original URL and a base URL
> (usually an absolute URL) to obtain parsed components; these parsed
> components may  then be recombined to construct a new URL. This is
> accomplished by parsing the original and base URLs (preprocessing by
> section 7.2 of [draft-ietf-iri-3987bis] first, then matching against
> the productions of section 3.2 of [draft-ietf-iri-3987bis]) but then
> combining the original and base components following the algorithms in
> section 5.2 of [RFC 3986], but applied to the Unicode characters which
> constitute the original and base.
>
> Definition: the document base URL of a Document object is the absolute
> URL defined by :
>
>   1. Let fallback base url be the document's address (an absolute
> URL).
>   2. If fallback base url is the string about:blank and the
> Document's browsing context has a creator browsing context, then let
> fallback base url be the document base URL of the creator Document
> instead.
>   3. If there is no base element that is both a child of the head
> element and has a href attribute, then the document base URL is
> fallback base url.
>   4. Otherwise, the document base URL url is the result of resolving
> the href attribute of the first such element relative to fallback base
> url(note that  the base href attribute isn't affected by xml:base
> attributes).
>
>
>
>
>
> <iri-rewrite-draft.html>
Received on Monday, 1 March 2010 21:14:00 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:59 UTC