Re: New Text for RFC 2396 intro, reframing what URIs are for from Roy T. Fielding on 2003-04-23 (uri@w3.org from April 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Wed, 23 Apr 2003 15:56:54 -0700
To: Sandro Hawke <sandro@w3.org>
Cc: uri@w3.org
Message-Id: <E0DCDB80-75DE-11D7-8BE6-000393753936@apache.org>
> 1.  Being a URI
>
>     A URI is a string which conforms to the URI syntax given in this
>     document.  This restricted syntax serves several purposes:
>
>               a.  It excludes certain characters.  This allows
> 	      systems to use those characters to delimit URIs.
>
> 	      b.  It defines each URI as beginning with a "scheme"
> 	      name followed by a colon.  This allows independent
> 	      development and deployment of systems which offer
> 	      URIs additional semantics and functionality.
>
> 	      c.  It defines a few characters (including "/") to have
> 	      special "hierarchical" meaning, to allow for "relative"
> 	      URIs.
>
> 	      d.  It defines a character-escape mechanism (using "%")
> 	      to allow special characters (like "/") to be used as
> 	      normal characters, without their special URI meaning.

The above would be a decent summary of the syntax, but not a replacement
for the definitions.

> 	      e.  It keeps them syntactically distinct from some other
> 	      short, formally-specified strings, so they can sometimes
> 	      be intermixed or used to flag an extension to a protocol
> 	      or when "webifying" systems.

Umm, no it doesn't.  Other syntax (like XML attributes) keeps them
distinct.

> 2.  The Identification Function (RIF)
>
>    There is a single relation, called the "RFC 2396bis
>    identification function" ("RIF"), which maps from each URI to
>    exactly one thing at any point in time.  Some shared knowledge of
>    this relation is essentially to communcation using URIs, but
>    complete shared knowledge is rarely possible.  The central efforts
>    and standards related to URIs concern techniques for sharing
>    knowledge of this relation sufficient for particular applications.

That simply isn't true.  There is no single identification function,
and saying it maps to one thing would hopelessly confuse folks.
Furthermore, the central efforts of URI standardization is to make
interoperable use of the mapping, not to share knowledge about
the relation itself.

>               a.  We call the objects in the range (set of possible
> 	      output values) of RIF "resources".  This term is not
> 	      intended to exclude anything and the range of RIF is in
> 	      no way restricted.  Every person, place, event, physical
> 	      object, imaginary character, ... anything and everything
> 	      is in the range of RIF and is technically a resource --
> 	      but calling something a "resource" suggests that it is
> 	      likely to be identified by a URI in practice.  For
> 	      example, the integer zero is technically a resource
> 	      (since everything is a resource), but calling it a
> 	      resource would be misleading outside of a context where
> 	      URIs were actually being used to denote integers.
>
> 	      [[ Thus RDF bNodes and literals can be said to identify
>               resources, even if there is no URI in use, because
>               assigning a URI would be reasonable and may happen
>               automatically in some software. ]]

That doesn't follow at all.  The integer zero is a resource.  Why are
you talking about this here when part (b) is about the same issue?

If we define "resource" in terms of whether or not it has a URI, then
the definition is circular and does exclude those things that have
not yet been assigned a URI.  But we already know people who aren't
willing to live with that definition, since they don't want resources
popping in and out of existence based on the assignment of identifiers,
and furthermore they want to talk about resources that do not have a 
URI,
much as we talk about information that is not yet tied into the Web.
Likewise, they want a sensible way to talk about several URI that
identify a single resource, which can get confusing if the presence of
one URI makes it a resource.

> 	      b.  Resources can be further divided into "bound
> 	      resources" and "unbound resources".  Bound resources are
> 	      in the codomain of RIF; they are in fact identifiable
> 	      through RIF from some URI.  Unbound resources are not in
> 	      the codomain of RIF and cannot be identified through the
> 	      RIF mapping from some URI.  Not all resources can be
> 	      bound because there are more resources than URIs.  Since
> 	      it may not be possible to know whether a given resource
> 	      is bound, the boundness distinction should be used with
> 	      care, or used with respect to a particular URI scheme as
> 	      in, "Since my new book does not yet have an ISBN, it is
> 	      an unbound resource with respect to the isbn: scheme."
>
> 	      [[ That's trying to address Mike Mealling's requirement.
> 	      http://lists.w3.org/Archives/Public/uri/2003Apr/0055 ]]

There aren't more resources than URIs. The number of things is
non-numerable, whereas the number of resources and number of URIs
are both numerable.  This goes back to what "resource" means in
English: something that has never been identified cannot be a
resource (it can be a source), and the set of things already
identified is numerable.

Michael's argument is to intentionally define Resource as circular
and simply exclude those things that have not been assigned a URI
(those resources have been identified, but not by a URI).  However,
separating the issue into bound an unbound resources is useful,
since it allows us to describe one form of broken reference that
is otherwise impossible to describe: a URI that is reassigned to a
different resource in spite of the documented "good practice".

> 	      c.  Elements of RIF SHOULD NOT change over time, since
> 	      such changes will render shared knowledge false until
> 	      corrected.  If changes do occur, it is sometimes said
> 	      "the resource has moved", and appropriate notifications
> 	      and forwarding SHOULD be made.  The term "moved"
> 	      suggests that a URI is a location for a resource, and
> 	      this is a common metaphor, but it is only a metaphor.
> 	      Information changing over time can be handled without
> 	      changing RIF through various techniques such as having
> 	      the resource itself be a function mapping the current
> 	      time to a resulting value.

If a redirection is provided, then the resource hasn't changed (only
the value of the mapping function).  The REST definition is better here,
though it needs to be qualified in the text (it only applies to the
use of URIs within information systems).

> 3.  URI-Scheme Languages
> 	
>    In addition to serving as an argument for the RIF function and
>    thereby identifying a resource, each URI MAY contain encoded
>    (serialized) information.  The syntax and semantics of the encoding
>    language are determined by the normative specificiation registed
>    with IANA for the scheme name and MUST be subordinate to the syntax
>    and semantics given in this document.
>
>               a.  Scheme languages SHOULD be declarative in nature,
> 	      with the URI text conveying knowledge either directly or
> 	      indirectly about the identified resource.  An example of
> 	      a direct assertion is the "data" scheme, where the URI
> 	      text fully describes the identified resource.  An
> 	      example of an indirect assertion is the "http" scheme,
> 	      where the URI text conveys the network address of a
> 	      server which can communicate on behalf of or about the
> 	      resource.

I can use that in the description of schemes, without reference to RIF.

> That's it.  IMHO it nicely refactors some tricky issues, but I surely
> can't claim to understand them all.  Probably the biggest thing is
> pulling RIF out of the intrinsic nature of URIs and being explicit
> about it.   Also I think it's important to be clear that using a URI
> as a an argument to RIF is almost totally different from decoding it
> according to some scheme-specific language.

Section 1.2 already does that.

....Roy
Received on Wednesday, 23 April 2003 18:54:47 UTC