Re: obsoleting 3986 -- what would it look like? from David Sheets on 2012-11-04 (uri@w3.org from November 2012)

From: David Sheets <kosmo.zb@gmail.com>
Date: Sun, 4 Nov 2012 15:29:22 -0800
To: Larry Masinter <masinter@adobe.com>
Cc: "uri@w3.org" <uri@w3.org>
Message-ID: <CAAWM5TxJH8dFxY4R_k-fUeyOc=ZFjd3sHowJ+RyTX7qE_AFwSA@mail.gmail.com>
Hi Larry,

On Fri, Nov 2, 2012 at 12:24 AM, Larry Masinter <masinter@adobe.com> wrote:
> Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.
>
> Doing so is pretty ambitious, of course, and likely to lead to all sorts of controversies, but I thought I'd give it a try.
>
> *  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
> * how much of the historical reasons for distinguishing between URIs and IRIs to leave. Again, it's interesting and useful material, but less so for practitioners who just want to know what a URL is and how to use it.
>   My temptation at this point is to leave out most of the explanatory material, and just put appendixes for URI, IRI and LEIRI which explain them as prior syntactic restrictions which are still supported by older protocols (including HTTP 1.x). Will HTTP 2.0 support UTF-8 URLs?
> * Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.
> * I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.
> * I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.

I am very interested in the aggregation of URI/URN/URL/IRI grammars
and formalization of codepoint translation tables. Does IETF have an
XML vocabulary for expressing ABNF (RFC 5234?) grammars? I am
presently developing machinery for grammar analysis that will be used
to generate reference parsers, serializers, and test suites directly
from the specification(s).

Is there a central repository of RFC XML (RFC 2629) documents? Are you
drafting the neo-URI RFC in a revision control system somewhere?

> <abstract>
>   <t>Uniform Resource Locators (URL) are compact strings which form a
>   namespace used as identifiers.  The URL namespace is federated:
>   there are URL schemes, each with its own semantics and syntactic
>   restrictions, and a registry of scheme names.  A relative URL is an
>   abbreviated form which can be combined with a base URL to form a new
>   URL (relative resolution).  Previously, the terms "Unform Resource
>   Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
>   used to designate syntactic restrictions of the URL space.
>   </t>
>   <t>This specification brings together these defintions into a single
>   specification and updates them to match current widespread usage,
>   most notably within the World Wide Web global information and
>   application system.
>   </t>
>   <t>This document is part of a set of documents intended to
>   replace RFCs 2141, 3986, 3987 and 4345</t>
> </abstract>

RFC 2141 is about well-known email endpoints for domains. How is this
related to the structure of identifiers?

RFC 4345 is about RC4 modes for SSH? How is this related? Or which
other RFC was meant?

>
> <section title="Introduction">
>
> <t>
>   The concept of a "Uniform Resource Locator" was introduced
>   by the World Wide Web global information initiative, whose
>   use of the concept dates from 1990, and was described in
>   "Universal Resource Identifiers in WWW" <xref target="RFC1630"/>
> </t>
>
> <t>
>   Uniform Resource Locators (URL) are compact strings which form a
>   namespace used as identifiers.  The URL namespace is federated:
>   there are URL schemes, each with its own semantics and syntactic
>   restrictions, and a registry of scheme names.  A relative URL is an
>   abbreviated form which can be combined with a base URL to form a new
>   URL (relative resolution).  Previously, the terms "Unform Resource
>   Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
>   used to designate syntactic restrictions of the URL space.
>   </t>
> <t>
>   This specification brings together these defintions into a single
>   specification and updates them to match current widespread usage,
>   most notably within the World Wide Web global information and
>   application system.</t>
> <t>
>   This specification and its companions "Comparison of URLs" <xref
>   target="url-comparison"/> "Guidelines for Bidirectional URLs" <xref
>   target="url-bidi-guidelines"/>, "Registration of URL schemes" <xref
>   target="url-registration"/> obsolete <xref target="RFC3986"/>, <xref
>   target="RFC3987"/>, <xref target="RFC4345"/>.
> </t>
>
> <section title="Uniform, Resource, Locate">
>
>   <t>The original design of URLs and its various forms intended
>    to accomplish many aspects. </t>
>   <t><list style="hanging">
>
>     <t hangText="Uniform Meaning">
>       The intention is that the same URL means (identifies, names,
>       locates) the same thing independent of context.</t>
>
>    <t hangText="Resources unlimited">
>      The notion of a resource was not limited in scope, with the idea
>      that URLs could be used to locate, identify or name not only
>      network accessible services, resources and documents, but also
>      people, artifacts, abstractions.</t>
>
>    <t hangText="Locate, Identify, Name">
>      An identifier embodies the information required to distinguish
>      what is being identified from all other things within its scope
>      of identification.  A locator embodies the information required
>      to find and access the thing being located. A name is a component
>      of an identifier assigned and resolved by some authority or
>      agent. This specification reverts to the most commonly used
>      "Locator" designation. </t>
>      <t>The role of URLs as locators, identifiers, and names have often
>      been in conflict with the design goal of "Uniform Meaning". Some
>      systems may use URLs (and, in particular, HTTP URLs) as identifiers
>      for abstractions, this usage is not supported by this specification
>      directly.</t>
>      <t hangText="Internationalized">
>
>      <t>URLs were originally defined to only consist of characters
>      from a limited repertoire of characters, selected from the upper
>      and lower case letters A-Z plus a limited set of punctuation
>      characters, with the provision that other data (and the coding
>      for other characters) could be included via an escape sequence.
>      This use was extended in later specifications of
>      Internationalized Resource Identifiers <xref target="RFC3897"/>
>      to include characters from a much larger repertoire.
>      </t>
>      <t>This specification specifies parsing and
>      processing of arbitrary strings of
>      Unicode characters as input, with previous syntactic
>      restrictions still required by older systems (URI, IRI)
>      specified in appendices.</t>
>    </list>
>   </t>

Great! These strings are so critically important for the future health
of the internet; I would love to see their structures completely and
unambiguously defined.

I'll send more information about my ABNF work when I have it (or
you're welcome to snoop; it's open source). Let me know if there is
anything else I can do to help.

Best regards,

David Sheets
Received on Monday, 5 November 2012 01:17:33 UTC