Re: obsoleting 3986 -- what would it look like? from Martin J. Dürst on 2012-11-05 (uri@w3.org from November 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 05 Nov 2012 19:48:32 +0900
To: Larry Masinter <masinter@adobe.com>
CC: "uri@w3.org" <uri@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <50979980.6070503@it.aoyama.ac.jp>
Hello Larry,

[cross-posting to public-iri@w3.org]

On 2012/11/02 16:24, Larry Masinter wrote:
> Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.
>
> Doing so is pretty ambitious, of course,

Yes indeed. I'm wondering why you think that this will be successful if 
a less ambitious project (updating the IRI spec and the URI/IRI scheme 
registration spec) is having problems getting enough attention.


and likely to lead to all sorts of controversies, but I thought I'd give 
it a try.
>
> *  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
> * how much of the historical reasons for distinguishing between URIs and IRIs to leave. Again, it's interesting and useful material, but less so for practitioners who just want to know what a URL is and how to use it.
>    My temptation at this point is to leave out most of the explanatory material, and just put appendixes for URI, IRI and LEIRI which explain them as prior syntactic restrictions which are still supported by older protocols (including HTTP 1.x). Will HTTP 2.0 support UTF-8 URLs?

This is all well and good, but one advantage of URIs (and IRIs) is that 
they don't allow characters that are used as delimiters. If we move to 
URLs as defined by "what browsers grok in HTML", then in each protocol 
that uses fixed delimiters, we have to say "URLs, but not containing xyz 
delimiters".


Also, the differences between a valid IRI reference and a valid HTML URL 
are very small, and can probably be removed all together. Here is what 
the URL spec said before it was moved from the W3C 
(http://dvcs.w3.org/hg/url/) to the WHATWG (http://url.spec.whatwg.org/):

 >>>>>>>>
A URL is a valid URL if at least one of the following conditions holds:

* The URL is a valid URI reference. [RFC3986]
* The URL is a valid IRI reference and it has no query component. [RFC3987]
* The URL is a valid IRI reference and its query component contains no 
unescaped non-ASCII characters. [RFC3987]
* The URL is a valid IRI reference and the character encoding of the 
URL's Document is UTF-8 or a UTF-16 encoding.
 >>>>>>>>

The conditions in the second, third, and fourth bullet are all related 
to the encoding of the query part. If we can get a handle on these in 
the IRI spec, then it may be possible to just collapse them. Then the 
first bullet can be removed by the fact that every URI reference is an 
IRI reference. (It still makes sense to call out the fact that every URI 
reference is a valid URL, but it isn't necessary spec-wise.)


> * Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.

URNs are just a single URI/IRI/URL scheme, so they shouldn't need any 
special treatment, but it is occasionally helpful to call out schemes as 
examples.


> * I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.

That's indeed tempting. But there's the problem that some software uses 
it that way. (Because I'm not on the TAG, httpRange-14 is less of a 
problem for me that it is for you).


> * I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.

I'm willing to help, in particular with stuff relating to IRIs and 
internationalization in general. But first we need a wide consensus that 
this is the right way to go.


Regards,   Martin.


> <abstract>
>    <t>Uniform Resource Locators (URL) are compact strings which form a
>    namespace used as identifiers.  The URL namespace is federated:
>    there are URL schemes, each with its own semantics and syntactic
>    restrictions, and a registry of scheme names.  A relative URL is an
>    abbreviated form which can be combined with a base URL to form a new
>    URL (relative resolution).  Previously, the terms "Unform Resource
>    Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
>    used to designate syntactic restrictions of the URL space.
>    </t>
>    <t>This specification brings together these defintions into a single
>    specification and updates them to match current widespread usage,
>    most notably within the World Wide Web global information and
>    application system.
>    </t>
>    <t>This document is part of a set of documents intended to
>    replace RFCs 2141, 3986, 3987 and 4345</t>
> </abstract>
>
>
>
>
> <section title="Introduction">
>
> <t>
>    The concept of a "Uniform Resource Locator" was introduced
>    by the World Wide Web global information initiative, whose
>    use of the concept dates from 1990, and was described in
>    "Universal Resource Identifiers in WWW"<xref target="RFC1630"/>
> </t>
>
> <t>
>    Uniform Resource Locators (URL) are compact strings which form a
>    namespace used as identifiers.  The URL namespace is federated:
>    there are URL schemes, each with its own semantics and syntactic
>    restrictions, and a registry of scheme names.  A relative URL is an
>    abbreviated form which can be combined with a base URL to form a new
>    URL (relative resolution).  Previously, the terms "Unform Resource
>    Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
>    used to designate syntactic restrictions of the URL space.
>    </t>
> <t>
>    This specification brings together these defintions into a single
>    specification and updates them to match current widespread usage,
>    most notably within the World Wide Web global information and
>    application system.</t>
> <t>
>    This specification and its companions "Comparison of URLs"<xref
>    target="url-comparison"/>  "Guidelines for Bidirectional URLs"<xref
>    target="url-bidi-guidelines"/>, "Registration of URL schemes"<xref
>    target="url-registration"/>  obsolete<xref target="RFC3986"/>,<xref
>    target="RFC3987"/>,<xref target="RFC4345"/>.
> </t>
>
> <section title="Uniform, Resource, Locate">
>
>    <t>The original design of URLs and its various forms intended
>     to accomplish many aspects.</t>
>    <t><list style="hanging">
>
>      <t hangText="Uniform Meaning">
>        The intention is that the same URL means (identifies, names,
>        locates) the same thing independent of context.</t>
>
>     <t hangText="Resources unlimited">
>       The notion of a resource was not limited in scope, with the idea
>       that URLs could be used to locate, identify or name not only
>       network accessible services, resources and documents, but also
>       people, artifacts, abstractions.</t>
>
>     <t hangText="Locate, Identify, Name">
>       An identifier embodies the information required to distinguish
>       what is being identified from all other things within its scope
>       of identification.  A locator embodies the information required
>       to find and access the thing being located. A name is a component
>       of an identifier assigned and resolved by some authority or
>       agent. This specification reverts to the most commonly used
>       "Locator" designation.</t>
>       <t>The role of URLs as locators, identifiers, and names have often
>       been in conflict with the design goal of "Uniform Meaning". Some
>       systems may use URLs (and, in particular, HTTP URLs) as identifiers
>       for abstractions, this usage is not supported by this specification
>       directly.</t>
>       <t hangText="Internationalized">
>
>       <t>URLs were originally defined to only consist of characters
>       from a limited repertoire of characters, selected from the upper
>       and lower case letters A-Z plus a limited set of punctuation
>       characters, with the provision that other data (and the coding
>       for other characters) could be included via an escape sequence.
>       This use was extended in later specifications of
>       Internationalized Resource Identifiers<xref target="RFC3897"/>
>       to include characters from a much larger repertoire.
>       </t>
>       <t>This specification specifies parsing and
>       processing of arbitrary strings of
>       Unicode characters as input, with previous syntactic
>       restrictions still required by older systems (URI, IRI)
>       specified in appendices.</t>
>     </list>
>    </t>
>
>
Received on Monday, 5 November 2012 10:49:08 UTC