obsoleting 3986 -- what would it look like? from Larry Masinter on 2012-11-02 (uri@w3.org from November 2012)

From: Larry Masinter <masinter@adobe.com>
Date: Fri, 2 Nov 2012 00:24:31 -0700
To: "uri@w3.org" <uri@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D1E36DA4492@nambxv01a.corp.adobe.com>
Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.

Doing so is pretty ambitious, of course, and likely to lead to all sorts of controversies, but I thought I'd give it a try.

*  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
* how much of the historical reasons for distinguishing between URIs and IRIs to leave. Again, it's interesting and useful material, but less so for practitioners who just want to know what a URL is and how to use it.
  My temptation at this point is to leave out most of the explanatory material, and just put appendixes for URI, IRI and LEIRI which explain them as prior syntactic restrictions which are still supported by older protocols (including HTTP 1.x). Will HTTP 2.0 support UTF-8 URLs?
* Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.
* I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.
* I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.


<abstract>
  <t>Uniform Resource Locators (URL) are compact strings which form a
  namespace used as identifiers.  The URL namespace is federated:
  there are URL schemes, each with its own semantics and syntactic
  restrictions, and a registry of scheme names.  A relative URL is an
  abbreviated form which can be combined with a base URL to form a new
  URL (relative resolution).  Previously, the terms "Unform Resource
  Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
  used to designate syntactic restrictions of the URL space.
  </t>
  <t>This specification brings together these defintions into a single
  specification and updates them to match current widespread usage,
  most notably within the World Wide Web global information and
  application system.
  </t>
  <t>This document is part of a set of documents intended to
  replace RFCs 2141, 3986, 3987 and 4345</t>
</abstract>




<section title="Introduction">

<t>
  The concept of a "Uniform Resource Locator" was introduced
  by the World Wide Web global information initiative, whose
  use of the concept dates from 1990, and was described in 
  "Universal Resource Identifiers in WWW" <xref target="RFC1630"/>
</t>

<t>
  Uniform Resource Locators (URL) are compact strings which form a
  namespace used as identifiers.  The URL namespace is federated:
  there are URL schemes, each with its own semantics and syntactic
  restrictions, and a registry of scheme names.  A relative URL is an
  abbreviated form which can be combined with a base URL to form a new
  URL (relative resolution).  Previously, the terms "Unform Resource
  Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
  used to designate syntactic restrictions of the URL space.
  </t>
<t>
  This specification brings together these defintions into a single
  specification and updates them to match current widespread usage,
  most notably within the World Wide Web global information and
  application system.</t>
<t>
  This specification and its companions "Comparison of URLs" <xref
  target="url-comparison"/> "Guidelines for Bidirectional URLs" <xref
  target="url-bidi-guidelines"/>, "Registration of URL schemes" <xref
  target="url-registration"/> obsolete <xref target="RFC3986"/>, <xref
  target="RFC3987"/>, <xref target="RFC4345"/>.
</t>

<section title="Uniform, Resource, Locate">
    
  <t>The original design of URLs and its various forms intended
   to accomplish many aspects. </t>
  <t><list style="hanging">

    <t hangText="Uniform Meaning">
      The intention is that the same URL means (identifies, names,
      locates) the same thing independent of context.</t>

   <t hangText="Resources unlimited">
     The notion of a resource was not limited in scope, with the idea
     that URLs could be used to locate, identify or name not only
     network accessible services, resources and documents, but also
     people, artifacts, abstractions.</t>
   
   <t hangText="Locate, Identify, Name">
     An identifier embodies the information required to distinguish
     what is being identified from all other things within its scope
     of identification.  A locator embodies the information required
     to find and access the thing being located. A name is a component
     of an identifier assigned and resolved by some authority or
     agent. This specification reverts to the most commonly used
     "Locator" designation. </t>
     <t>The role of URLs as locators, identifiers, and names have often
     been in conflict with the design goal of "Uniform Meaning". Some
     systems may use URLs (and, in particular, HTTP URLs) as identifiers
     for abstractions, this usage is not supported by this specification
     directly.</t>
     <t hangText="Internationalized">

     <t>URLs were originally defined to only consist of characters
     from a limited repertoire of characters, selected from the upper
     and lower case letters A-Z plus a limited set of punctuation
     characters, with the provision that other data (and the coding
     for other characters) could be included via an escape sequence.
     This use was extended in later specifications of
     Internationalized Resource Identifiers <xref target="RFC3897"/>
     to include characters from a much larger repertoire.
     </t>
     <t>This specification specifies parsing and
     processing of arbitrary strings of
     Unicode characters as input, with previous syntactic
     restrictions still required by older systems (URI, IRI)
     specified in appendices.</t>
   </list>
  </t>
Received on Friday, 2 November 2012 07:25:07 UTC