Re: New Text for RFC 2396 intro, reframing what URIs are for from Sandro Hawke on 2003-04-24 (uri@w3.org from April 2003)

From: Sandro Hawke <sandro@w3.org>
Date: Thu, 24 Apr 2003 02:52:50 -0400
To: "Roy T. Fielding" <fielding@apache.org>
cc: uri@w3.org
Message-Id: <200304240652.h3O6qoIN003089@roke.hawke.org>
> > 1.  Being a URI
> >
> >     A URI is a string which conforms to the URI syntax given in this
> >     document.  This restricted syntax serves several purposes:
> >
> >               a.  It excludes certain characters.  This allows
> > 	      systems to use those characters to delimit URIs.
> >
> > 	      b.  It defines each URI as beginning with a "scheme"
> > 	      name followed by a colon.  This allows independent
> > 	      development and deployment of systems which offer
> > 	      URIs additional semantics and functionality.
> >
> > 	      c.  It defines a few characters (including "/") to have
> > 	      special "hierarchical" meaning, to allow for "relative"
> > 	      URIs.
> >
> > 	      d.  It defines a character-escape mechanism (using "%")
> > 	      to allow special characters (like "/") to be used as
> > 	      normal characters, without their special URI meaning.
> 
> The above would be a decent summary of the syntax, but not a replacement
> for the definitions.

Indeed, this is just meant as a bit of introduction, giving some
motivation for URIs to help people understand what's going on.

> > 	      e.  It keeps them syntactically distinct from some other
> > 	      short, formally-specified strings, so they can sometimes
> > 	      be intermixed or used to flag an extension to a protocol
> > 	      or when "webifying" systems.
> 
> Umm, no it doesn't.  Other syntax (like XML attributes) keeps them
> distinct.

Hrm.  What I'm trying to say here is that the restricted syntax of URIs
sometimes lets you define a field to have type "URI-or-number" or
"URI-or-word" or "URI-or-keywords".  This is one way to view browsers
having a switch on the "address" field like

   if address.find(":")     # loosely, if it matches absolute-URI-Ref
      openURL(address)
   else if address.find(" ")  # loosely, if it contains delimiters
      keyworkSearch(address)
   else 
      guessAtPossibleURL(address)

It's not an important part of URI design, but I think it is still a
real benefit.

> > 2.  The Identification Function (RIF)
> >
> >    There is a single relation, called the "RFC 2396bis
> >    identification function" ("RIF"), which maps from each URI to
> >    exactly one thing at any point in time.  Some shared knowledge of
> >    this relation is essentially to communcation using URIs, but
> >    complete shared knowledge is rarely possible.  The central efforts
> >    and standards related to URIs concern techniques for sharing
> >    knowledge of this relation sufficient for particular applications.
> 
> That simply isn't true.  There is no single identification function,
> and saying it maps to one thing would hopelessly confuse folks.
> Furthermore, the central efforts of URI standardization is to make
> interoperable use of the mapping, not to share knowledge about
> the relation itself.

Are you saying instead that every agent has its own mapping?  I think
there's a parallel to my description of one partially-known mapping
where you say everyone has their own mapping and they try to figure
out what each other's mappings are.  But I think imagining one ideal,
not-fully-knowable mapping works better.  Perhaps I'm not explaining
it well.   I imagine that mapping as a little like the mapping from
people to the time of their birth: it's surely not practical to
collect in its entirety, but parts of it are useful and can be easily
communicated. 

How/where does saying it maps to one thing (for each URI) confuse
folks?  That's another way of saying URIs are defined to be
unambiguous, right?  (Only is uses high-school-level mathematical
jargon.)

Is it too controvercial that each URI maps to a single thing (at any
point in time at least)?  I'm not saying everyone can determine the
mapping; just that there is one true, ideal value.  No URI is without
an identified resource; no URI identifies more than one resource
(although it may of course identify a resource which is a collection
of several resources). This is the view of URIs Pat wrote (under
guidence from the WG of course) into the RDF Semantics (as I
understand it, at least).  It is at odds with the folks who want to
treat URIs like natural language words with all their levels of
ambiguity, but I thought we were generally agreed such ambiguity could
and should be avoided.

> >               a.  We call the objects in the range (set of possible
> > 	      output values) of RIF "resources".  This term is not
> > 	      intended to exclude anything and the range of RIF is in
> > 	      no way restricted.  Every person, place, event, physical
> > 	      object, imaginary character, ... anything and everything
> > 	      is in the range of RIF and is technically a resource --
> > 	      but calling something a "resource" suggests that it is
> > 	      likely to be identified by a URI in practice.  For
> > 	      example, the integer zero is technically a resource
> > 	      (since everything is a resource), but calling it a
> > 	      resource would be misleading outside of a context where
> > 	      URIs were actually being used to denote integers.
> >
> > 	      [[ Thus RDF bNodes and literals can be said to identify
> >               resources, even if there is no URI in use, because
> >               assigning a URI would be reasonable and may happen
> >               automatically in some software. ]]
> 
> That doesn't follow at all.  The integer zero is a resource.  Why are
> you talking about this here when part (b) is about the same issue?

I'm trying to capture a subtlety of practical usage here, saying that
the word "resource" has both denotation and connotation.  Literally,
saying "foo is a resource" means nothing, but it *suggests* that you
expect foo may well have a URI.  Maybe this is a silly point, but I
hear people say things like "that chair is a resource" (I obviously
hang out with a bad sort of people) and they mean "that chair should
probably have a URI, or maybe it already does."  

This is a much fuzzier notion than boundness.  If they said "that
chair is a bound resource" they're asserting it definitely has a URI.

The term "resource" can also be used as a property name (instead of a
class name), as in "What is http://w3.org's resource?"   I'm not sure
how to capture that.

> If we define "resource" in terms of whether or not it has a URI, then
> the definition is circular and does exclude those things that have
> not yet been assigned a URI.  But we already know people who aren't
> willing to live with that definition, since they don't want resources
> popping in and out of existence based on the assignment of identifiers,
> and furthermore they want to talk about resources that do not have a 
> URI,
> much as we talk about information that is not yet tied into the Web.
> Likewise, they want a sensible way to talk about several URI that
> identify a single resource, which can get confusing if the presence of
> one URI makes it a resource.
> 
> > 	      b.  Resources can be further divided into "bound
> > 	      resources" and "unbound resources".  Bound resources are
> > 	      in the codomain of RIF; they are in fact identifiable
> > 	      through RIF from some URI.  Unbound resources are not in
> > 	      the codomain of RIF and cannot be identified through the
> > 	      RIF mapping from some URI.  Not all resources can be
> > 	      bound because there are more resources than URIs.  Since
> > 	      it may not be possible to know whether a given resource
> > 	      is bound, the boundness distinction should be used with
> > 	      care, or used with respect to a particular URI scheme as
> > 	      in, "Since my new book does not yet have an ISBN, it is
> > 	      an unbound resource with respect to the isbn: scheme."
> >
> > 	      [[ That's trying to address Mike Mealling's requirement.
> > 	      http://lists.w3.org/Archives/Public/uri/2003Apr/0055 ]]
> 
> There aren't more resources than URIs. The number of things is
> non-numerable, whereas the number of resources and number of URIs
> are both numerable.

I think this comes down to the different senses of "resource".  I
agree URIs and bound resources are countably infinite (numerable),
while things and resource in the strict (denotational) sense are
uncountably infinite (non-numerable).  Where resources in the loose
connotational sense fall depends on what is reasonable to identify
with URIs.  Might someone want to identify, say, the irrational
numbers with URIs?

This is too close to counting angels; the text "Not all resources can
be bound because there are more resources than URIs." should just be
removed as confusing more than it clarifies.

> English: something that has never been identified cannot be a
> resource (it can be a source), and the set of things already
> identified is numerable.

I must have missed where the term "source" got introduced...

> Michael's argument is to intentionally define Resource as circular
> and simply exclude those things that have not been assigned a URI
> (those resources have been identified, but not by a URI).  However,
> separating the issue into bound an unbound resources is useful,
> since it allows us to describe one form of broken reference that
> is otherwise impossible to describe: a URI that is reassigned to a
> different resource in spite of the documented "good practice".
> 
> > 	      c.  Elements of RIF SHOULD NOT change over time, since
> > 	      such changes will render shared knowledge false until
> > 	      corrected.  If changes do occur, it is sometimes said
> > 	      "the resource has moved", and appropriate notifications
> > 	      and forwarding SHOULD be made.  The term "moved"
> > 	      suggests that a URI is a location for a resource, and
> > 	      this is a common metaphor, but it is only a metaphor.
> > 	      Information changing over time can be handled without
> > 	      changing RIF through various techniques such as having
> > 	      the resource itself be a function mapping the current
> > 	      time to a resulting value.
> 
> If a redirection is provided, then the resource hasn't changed (only
> the value of the mapping function).  The REST definition is better here,
> though it needs to be qualified in the text (it only applies to the
> use of URIs within information systems).

Agreed, I think.  "The resource has moved" is misleading not just for
the location metaphor, but also because it suggests something about
the resource itself has changed, while it's really just the mapping
which has changed.  When I type "mv foo bar" I might think I'm moving
a file, but I'm really just changing some directory entries.

Still, I think an HTTP redirect is more like a symbolic link than a
hard link.  It doesn't transparently give you the same thing unless
you chose to operate at a higher layer of abstraction.    

Specificially, I think 301 and 307 redirects are message from the
server letting the client know that a point in RIF has changed
(permanently or temporarily).  These, along with 300 and 302
redirects, suggest to me (now that I think about it some more) that
RIF is actually a partial function on URIs with no mapping defined for
the URIs which generate those redirects.
> 
> > 3.  URI-Scheme Languages
> > 	
> >    In addition to serving as an argument for the RIF function and
> >    thereby identifying a resource, each URI MAY contain encoded
> >    (serialized) information.  The syntax and semantics of the encoding
> >    language are determined by the normative specificiation registed
> >    with IANA for the scheme name and MUST be subordinate to the syntax
> >    and semantics given in this document.
> >
> >               a.  Scheme languages SHOULD be declarative in nature,
> > 	      with the URI text conveying knowledge either directly or
> > 	      indirectly about the identified resource.  An example of
> > 	      a direct assertion is the "data" scheme, where the URI
> > 	      text fully describes the identified resource.  An
> > 	      example of an indirect assertion is the "http" scheme,
> > 	      where the URI text conveys the network address of a
> > 	      server which can communicate on behalf of or about the
> > 	      resource.
> 
> I can use that in the description of schemes, without reference to RIF.
> 
> > That's it.  IMHO it nicely refactors some tricky issues, but I surely
> > can't claim to understand them all.  Probably the biggest thing is
> > pulling RIF out of the intrinsic nature of URIs and being explicit
> > about it.   Also I think it's important to be clear that using a URI
> > as a an argument to RIF is almost totally different from decoding it
> > according to some scheme-specific language.
> 
> Section 1.2 already does that.

Hrm.  There's text in 1.2 I'm unhappy with:

'The term "Uniform Resource Name" (URN) refers to the subset of URIs
that are required to remain globally unique and persistent even when
the resource ceases to exist or becomes unavailable.'

This is one of those sentences that makes me think we need to be
explicit about RIF.  After all, a URI *itself* cannot be particularly
unique or persistent, since it's just a string.  It's the mapping that
may be persistent, and/or the server's network address.

Following the Contemporary View [RFC 3305], I guess a URN is simply a
URI which is not a URL, so persistence no longer belongs in the
picture anyway.  Possible replacement text:

   The term "Uniform Resource Name" (URN) refers to the subset of URIs
   which are not URLs.  They may be useful when no stable primary
   access mechanisms are available, or in the presense of an effective
   URN resolution system.

but that conflict a bit with the beginning of the paragraph saying
something can be both a URL and a URN.  Any idea what that means?

Meanwhile, I'm not sure anything in 1.2 really makes my point about
separation, but I'll have to see if I can come up with more specific
wording on that.

    -- sandro
Received on Thursday, 24 April 2003 02:52:57 UTC