Re: Hypertext::non-Hypertext not URL::URN

Roy T. Fielding (fielding@kiwi.ics.uci.edu)
Fri, 02 Jan 1998 22:11:32 -0800


To: Larry Masinter <masinter@parc.xerox.com>
cc: michaelm@rwhois.net, harald.t.alvestrand@uninett.no, moore@cs.utk.edu,
Subject: Re: Hypertext::non-Hypertext not URL::URN 
In-reply-to: Your message of "Fri, 02 Jan 1998 09:53:03 PST."
             <34AD297F.15D39C63@parc.xerox.com> 
Date: Fri, 02 Jan 1998 22:11:32 -0800
From: "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>
Message-ID:  <9801022218.aa25343@paris.ics.uci.edu>

>I think you've made an important point that I don't want to
>get lost. The syntax forms that are controversial
>(fragment identifiers, relative forms, query syntax)
>are part of the application of HYPERTEXT.
>
>In fact, whether or not you want those forms seems to depend
>entirely on whether or not you think you're doing hypertext.

Nope.  Relative forms are a means of namespace abbreviation.
Query syntax is simply a convenient mechanism for parameterized
access to a resource, which exists only for the sake of common
client implementations.  Fragment identifiers are a mechanism for
identifying a subset of the result of a retrieval.  Whether or not
all of these are only part of the application of hypertext depends
on your definition of hypertext.

Hypertext can be usefully defined either in terms of the UI or
the architecture.  Using the UI definition of hypertext from
Ted Nelson (via Jeff Conklin's Survey): "a combination of natural
language text with the computer's capacity for interactive branching,
or dynamic display ... of a nonlinear text" is obviously insufficient
to cover all of the applications which use relative URI today.
Keep in mind that my protocol library was original written to support
maintenance applications, not dynamic display.

The architectural definition of hypertext is simply that information
can be organized by relationships between information, and further
that resources can be organized by the relationships between
representations of those resources.  This is what I think of as an
Engelbart/Berners-Lee definition, though I'm not sure it was ever
written as such by them.  This definition does cover all of the
controversial syntax forms, but then it also covers all possible uses
of URNs as well, including semantically rich name comparison.

In either case, making a distinction between URI use in hypertext and
URI use outside of hypertext is pointless.  The URI syntax includes a
variety of forms that *allow* the use of relative identifiers, *allow*
the distinction of query parts, and *allow* the presence of fragment
identifiers.  They exist NOT because they are useful for all URI, but
because they ARE useful for some URI.  The syntax is thus defined to
*reserve* those forms in such a way as they *can* be used when someone
wants to use them, and in a way that is *independent* of the scheme
definition.  Moreover, their presence has no adverse impact on uses
of URI that exclude those forms.

That is why I made an explicit distinction between URI-reference
and the other BNF terms in the specification.  HTTP, HTML, and XML
(and many other protocols) need a Draft Standard for a URI-reference.
That is what the URI syntax is all about.  It is not, and never has been,
the intersection of the requirements for URL and URN.  It cannot be,
since the actual requirements for individual URL schemes do not have
much in the way of an intersection.  Protocol fields that do not wish
to allow the relative form and/or fragment will use the <absoluteURI>
BNF term instead.

>The distinction between having them and not seems to have
>little to do with whether or not the identifiers are "location
>independent". If you want a resource locator but you're
>not doing hypertext (e.g., the resources that you're locating
>are printers for IPP or servers for service location or whatever)
>then the relative, query, and fragment forms are not applicable.

Fragment wouldn't be useful, but relative and query forms are
useful in any context where many related printers are being identified,
or servers for service location or whatever.  Namespace abbreviation 
is a universal principle.

>If you ARE doing hypertext, then those forms are useful,
>even if you believe the identifiers are permanent, location
>independent, and have all of the attributes that are intended
>for URNs and not for URLs.
>
>"Uniform Resource Identifiers" define a space of fully qualified,
>non annotated names, while "hypertext references" imbue some
>semantics to the internal syntax of URIs (namely, give significance
>to "/" and "?" within Uniform Resource Identifiers), add a new
>syntactic element ("#" fragment identifiers), and add a new protocol
>element (relative identifiers).

Sorry, that has no basis in reality.  "Uniform Resource Identifiers"
have a Uniform syntax in order to be used and processed as URI references
by portions of overall system implementations that DO NOT KNOW the
scheme-specific semantics.  This allows a separation of concerns between
those elements of the system that collect references (e.g., HTTP field
value parsers, HTML/XML element attribute parsers, etc.) and those
elements of the system that perform semantic operations on those
identifiers.

In order to process an entire set in common, the set of rules that
guide that process must be common for the entire set.  Applications
that use a URI reference do not check to see whether it is a name
or a location before they apply the relative resolution process.
Aside from Navigator (which is artificially restricted to a small set
of URL schemes by poor design), existing WWW applications don't even
check the scheme name until an actual retrieval request is made.

Michael, while I appreciate your desire to have a general definition
of URI that represents only the philosophical principle of identifying
a resource, the fact of life is that we don't need one.  Such a paper
would be useful as a research survey, but not as a Draft Standard
definition of specific protocol elements in current practice.  The latter
is what I am doing, and what <draft-fielding-uri-syntax-01> is intended
to represent, and why we are discussing this in the IETF and not at
a research conference.

While we sit here debating what is or is not relevant to a URN, several
dozen technical specifications in preparation by the IETF or the W3C are
being held back because we don't want them to be specified in terms of
the older RFCs (1670, 1738, 1808) which are known to be wrong.  If there
is nothing in the current draft that prevents the URN WG from defining
the URN as they please, then there is no valid objection to the draft
regarding what the URN does or does not allow.

The only alternative is to make "Locator" synonymous with "a URI that
might be used to locate a resource for the purpose of access" and then
call everything a URL, including all URNs when they are used for that
purpose.  But we have ALREADY discussed and discarded that option because
nobody here (including me) wants to refer to URNs as a subset of URLs.

....Roy