NOTE: This is a draft of one way of rewriting section 2.5.1 of The HTML 5 editor's draft of 25 August 2009, provided as an example.
Historically the term “URI” was used for “Universal Resource Identifier” [RFC1630]; with a Uniform Resource Locator (URL) being the form of URI which expresses an address which maps onto an access algorithm using network protocols. Further technical specifications [RFC 1738], [RFC 1808], [RFC 2396] and [RFC 3986], subsequently defined a “relative URL”, elaborated the distinction between Uniform Resource Names (URN) and URLs, and led to the adoption of “URI” as Uniform Resource Identifier, and introduced the notion of an “Internationalized Resource Identifier” (IRI) [RFC 3987] as a syntactic form which allowed (unencoded) non-ASCII Unicode characters. [HTML 4.01] (from which this specification was evolved) used "URI" as specified by [RFC 2396], but contained recommended processing rules for HTML agents (in [HTML 4.01] appendix B.2) for handling invalid values containing non-ASCII characters, roughly corresponding to the guidance in [RFC 3987].
Popular informal usage continues to use “URL” to refer to any of these variations, although, for the most part, the term "URL" alone indicates an "absolute" form including a scheme (see below).
Definition: In this document, the term “URL” is used for any strings used to identify a resource, including relative forms; the distinction between various forms are made in context or with qualifiers or by processing rules, as to whether the URL corresponds to a URI or a “relative reference” (as specified in [RFC 3986]) or the “internationalized” forms of those, IRI and relative IRI reference (as specified in [RFC 3987] or its update [draft-duerst-iri-bis]), or to strings which (after preprocessed by the rules defined in Section 7.2 of [draft-duerst-iri-bis]) result in one of those forms.
Definition: a valid URL is a string that matches the production of “iri-reference” in [draft-duerst-iri-bis].
Definition: a valid absolute URL is a string that matches the production of “IRI” in [draft-duerst-iri-bis].
Definition: an absolute URL is a string which results in a valid absolute URL (defined above) after being processed by the rules of “Web Address Processing” in section 7.2 of [draft-duerst-iri-bis]. Note that this basically means any string which, after preprocessing, starts with an initial string matching the “scheme” production of [draft-duerst-iri-bis], followed by a colon.
Definition: A relative URL is a URL that is not an absolute URL; similarly, a valid relative URL is a valid URL that is not an absolute URL.Definition: To parse a URL into its component parts means to first preprocess the string according to section 7.2 of [draft-duerst-iri-bis] “Web Address Processing”, and then to parse the results of preprocessing (as per section 3.2 of [draft-duerst-iri-bis]) against the “iri-reference” (if parsing a URL) or the “IRI” production (if parsing an absolute URL). Note that the preprocessing steps generally result in a valid URL or a valid relative URL. Matching BNF components results in the following parts:
Definition: The term resolve (in the context of resolve a URL relative to another URL) is used to describe the process of combining two strings: an original URL and a base URL (usually an absolute URL) to obtain parsed components; these parsed components may then be recombined to construct a new URL. This is accomplished by parsing the original and base URLs (preprocessing by section 7.2 of [draft-duerst-iri-bis] first, then matching against the productions of section 3.2 of [draft-duerst-iri-bis]) but then combining the original and base components following the algorithms in section 5.2 of [RFC 3986], but applied to the Unicode characters which constitute the original and base.
Definition: the document base URL of a Document object is the absolute URL defined by :