Ambiguity as a solution rather than a problem (WAS: The range of the HTTP dereference function) from Miles Sabin on 2002-03-31 (www-tag@w3.org from March 2002)

From: Miles Sabin <miles@mistral.co.uk>
Date: Sun, 31 Mar 2002 21:25:48 +0100
To: <www-tag@w3.org>
Message-ID: <000b01c1d8f2$3eb3f1b0$a3eab8c3@milessabin.com>
Gavin Thomas Nicol wrote,
> I think the URI->resource->representation model needs to be 
> clarified outside the scope of HTTP before much else is done.
>
> I'd be interested in hearing what you think is needed.

[ If you're impatient you might want to skip to the end where I
  sketch out a couple of applications ... hopefully they'll motivate
  the sequel ]

I'm proposing a fairly radical simplification and generalization of
the model (tho' I think that in fact it meshes quite conservatively 
with real-world practice).

The resource/representation distinction is motivated by the desire
to keep the resource identified by a URI fixed despite changes in the 
underlying representation (eg. changes over time or variation due to 
content negotiation). This model has three important characteristics.

1. It's unambiguous.

   It claims that a URI identifies exactly one resource, or nothing at
   all.

2. It's one-way.

   It focusses solely on preserving resources across changes in
   representation, not the reverse. This misses out on the fact that 
   when we update "Todays news", it's also the case that what was the 
   representation of todays news now becomes the representation of 
   "Yesterdays news".

   In general the slippage of resource/representation mappings can go 
   in both directions: resources can map to new representations, and 
   representations can map to new resources.

3. It's one-level.

   It accommodates a single mapping between a resource as a "concept",
   and a collection of entities as sequences of retrieved bits. This
   misses out on the fact that there are many levels of description
   that we could apply to the result of a retrieval operation on a
   URI. It could be described as,

     * A particular sequence of bits.
     * A snapshot of a document.
     * A document which changes over time.
     * Todays news from http://news.bbc.co.uk/.
     * A particular retrievers favourite resource.

   In each of these cases there is constituent/constituee
   relationship,

     * The document snapshot is constitued from a particular sequence
       of bits.
     * The time-variant document is constitued from the various
       document snapshots.
     * Todays news is constitued from various documents over time,
       each of which is likely to vary over a 24 hour interval.
     * The retrivers favourite resource is constitued from Todays
       news (until her preferences change) rather than any particular
       document.

  A one-level concept/entity model can capture at most one of these
  relationships.

My proposal is as follows,

* Drop the resource/representation distinction.

  The one-way focus of the current model suggests that there's an
  important asymmetry between resources and representations. This
  simply isn't so: representations can persist across changes in the
  resource they're the representation of in exactly the same way
  that a resource can persist across changes in its representation.

  The one-level focus of the current model suggests that there's a
  specific place at which resource/representation slippage can occur,
  between changing bits and a single fixed "concept". This isn't so
  either: changes in relationships between resources and the things
  which realize them can occur at just about any level of description
  imaginable.

  So I propose doing away with representations in any non-relative
  sense. Everything's a resource, it's just that some of those
  resources also function as _relative_ representations of other
  resources.

* Drop the assumption that URIs are unambiguous.

  If you're with me so far, then it's clear that we have to do this.
  In the "Todays news" example we have five distinct resources, but 
  only one URI. As such, that URI can't be treated as referring
  unambiguously to any _one_ of them.

  Whilst this might sound problematic, there is almost always some
  additional context which resolves the ambiguity in practice. In
  the case of time-varying resources, it's the time of retrieval
  (the WebDAV versioning extensions provide a more explicit 
  mechanism). In the case of format-variant resources we have content
  negotiation. And where a resource is not retrievable it will never 
  be eligable as a candidate when its URI is dereferenced.

Here are three examples of how this new model helps with long-standing 
problems,

* Non-retrievable resources.

  By Mark Bakers stipulation, http://www.markbaker.ca/ refers to him.
  However, if you attempt to dereference this URI via an HTTP GET you
  won't get Mark, you'll get a text/html document containing
  information about him, his family and his work. The two are not the 
  same in any sense, they're merely related.

  As such the URI is ambiguous: between Mark and the document. In
  practice this is completely unproblematic. An HTTP GET automatically
  resolves any ambiguity in favour of the document; and attempts to
  buy beers for http://www.markbaker.ca/ will only be attempted when
  it's clear from context that we're dealing with Mark.

* Namespace URIs

  Following on from the previous example, allowing URIs to be 
  ambiguous gives us the freedom to keep everybody happy. We can say 
  that a namespace URI refers to an abstract namespace and functions 
  as its name; and we can _also_ allow it to refer to one or more 
  associated documents by way of a location.

  As before, context resolves the ambiguity. Abstract namespaces are 
  not retrievable, hence an HTTP GET will automatically resolve the
  ambiguity in favour of an associated document, if any. If we want
  to choose between RDDL or something else, we have content 
  negotiation to take up any remaining slack. OTOH, the only thing
  which plays a part in namespace processing is the abstract 
  namespace, so in namespace processing the ambiguity is automatically
  resolved in favour of the namespace.

* URLs vs. URNs

  Note that in the preceeding example we were able to accommodate a
  distinction between a name (of an abstact namespace) and a location 
  (the retrieval target for an associated document). Note also that 
  that distinction cut across any notional URL/URN distinction. But 
  for all that the distinction was still there.

  Allowing URIs to be ambiguous and using additional contextual
  information to take up the slack allows all sides to declare victory
  in this case too. Those who insist (rightly IMO) that bandying
  "name" and "rigid designator" around doesn't do any magic are
  right: URIs which function as names _are_ potentially just as 
  ambiguous as URIs which don't. And those who insist (again, rightly 
  IMO) that there's an important distinction between a name and a 
  description are right too: they just have to make sure that their 
  _uses_ of names are sufficiently contextualized to eliminate any 
  practical ambiguity.

Comments most welcome.

Cheers,


Miles
Received on Sunday, 31 March 2002 15:25:49 UTC