Re: [ISSUE 57] Representation-source: a possible new approach to the HTTP Redirection Issue from Roy T. Fielding on 2008-02-25 (www-tag@w3.org from February 2008)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Mon, 25 Feb 2008 11:49:03 -0800
To: Danny Ayers <danny.ayers@gmail.com>
Cc: noah_mendelsohn@us.ibm.com, "W3C TAG" <www-tag@w3.org>
Message-Id: <E6416F61-E40C-4DE6-8B7A-D8A94EE8537B@gbiv.com>
On Feb 25, 2008, at 4:52 AM, Danny Ayers wrote:
> Roy asks whether "A key requirement of the Semantic Web is that URIs
> be used to identify resources unambiguously". Well, yes, I'd suggest
> it is, in exactly the same way the Web is dependent on an essentially
> unambiguous naming scheme - the resource identified and the URI are
> intimately bound, thanks to their somewhat circular, fixpoint kind of
> definition.

Right, the Web is bound by a consistency (or lack thereof) in
representations/results for any given URI.  But that consistency
only exists when observed over time.

> On the other hand the relationship between a resource and
> the thing it stands for does have ambiguity - the publisher may be
> clear, but the consumer of such information is limited to making their
> best interpretation of whatever (ultimately human-readable)
> definitions the publisher has provided.

And that's a problem because ...?

Let's try an example.

   1) a resource owner might have a good idea of what resource they
      intend to identify when they mint a URI, or they might just be
      uploading a hundred photos named IMG_98nnn.jpg.  The Web doesn't
      depend on cool URIs -- they are just a nice thing to have. The
      resource owner just thinks of it as "My vacation in Nevada".

   2) an author, having discovered a useful trove of photos, adds links
      to their personal favorites along with metadata to describe the
      resource that they are linking to.  The Web doesn't depend on the
      author's link semantics matching the owner's resource semantics,
      even though it would be nice if they matched.

   3) a thousand other authors do the same, using their own notions
      of the semantics that are important to them.  One person notices
      that the scenic backdrop of our owner, snacking on sandwiches by
      the side of a road in Nevada, contains what looks like an alien
      spacecraft sticking out the side of an exposed bluff.  Naturally,
      they slashdot the photo as evidence that UFOs exist, and it is
      linked to by another fifty thousand UFO enthusiasts as
      "proof that aliens exist among us (just ignore the guy with
      the sandwich)".

   4) Google's spider wanders by, notices all these links to these
      photos in this collection, and then builds an index based on
      the links and text surrounding the references by others to
      particular photos, with extra weight given to photos that are
      described in the same way by multiple references.

   5) The owner receives much fan mail and questions about this
      otherwise boring picture and (having read all the webarch
      documents) decides to maintain that URI as a permanent home
      for "I was abducted by a UFO in Nevada".

Here is the problem.  People mint URIs for various reasons and rarely
decide what they mean until long after.  People use URIs, and through
their use assign meaning that may have little or nothing to do with
the owner's original meaning.  This mix of meanings and intentions is
always ambiguous, even when the owner does take the time to carefully
describe what they intend by the semantics of a name.

Note that this entire example uses "Information Resources".
As I said throughout the earlier discussions, there is no relevant
distinction from the web's point of view between "information resources"
and "non-information resources."  Those categories exist purely for the
sake of argument, based on the theory that it is somehow more important
to perceive the ambiguity between those sets than it is to perceive the
ambiguity *within* those sets.  In fact, it is an entirely pointless
exercise in maneuvering closed-world assumptions, instead of facing up
to the real requirement: the Web is not a closed world.

The question isn't "can we remove ambiguity?" It should be "can we
understand a relationship given that ambiguity almost a certainty?"
Because that's what life on the Web is all about -- communicating
in spite of decentralized authority.  There is nothing that we can do
to the Web to make it less ambiguous without undoing the very design
that made it successful in the first place -- a loose, decentralized,
counter-authoritarian interconnectedness.

> But I think Roy does highlight the most important part of the issue
> when he says:
> [[
> On the Web, millions of people mint URIs, and millions more use them
> in references. Millions of human beings, conversing over time, with an
> occasional URI thrown in to refer to a subject under discussion.
> ]]
>
> Ok, the Semantic Web is an extension of the existing
> (document-oriented) Web. Flipping that over, I think it's reasonable
> to consider the existing Web as a projection or view of (some subset
> of) the Semantic Web.
>
>>
> From this perspective, regular HTML links can been seen as expressions
> of (s, p, o) statements, where the predicate isn't explicitly typed.
> The relation can be typed, using the rel/rev attributes in concert
> with a HTML Meta Data profile - GRDDL is the nearest we have to a
> formalism for this. But it's common practice to use a kind of
> human-friendly implicit typing, for example using <a
> href="http://www.ics.uci.edu/~fielding/">Roy Fielding</a> to refer to
> a person.

Note, however, that HTML anchors do not (by default) express an "is_a"
type of relationship from the content to the identified resource.
They are usually "more_about" relationships.

> But I'm suggesting the Semantic Web *does* need to distinguish between
> Roy the person and Roy's homepage. A reasonable RDF expression of the
> link above might be something like:
>
> <> dc:related <http://www.ics.uci.edu/~fielding/> .
> [ foaf:name "Roy Fielding";
>   foaf:homepage <http://www.ics.uci.edu/~fielding/> ] .

I think we are jumping back off the rails at this point. There is no
doubt that the Semantic Web needs to make logical assertions.  It does
so by defining things like foaf:name and foaf:homepage in unambiguous
ways, not by restricting the identifier range and certainly not by
making assumptions about HTTP status codes.  The above was true
between 1993-2000.  Today, my home page is <http://roy.gbiv.com/>.
The Semantic Web should be capable of understanding that, even when
it is temporarily untrue, because time is essential to understanding
the Web.

> But does the (document) Web need to distinguish between Roy the person
> and Roy's homepage? Evidently not, given the utility of simple linkage
> like that above.

That's not what the document Web is doing with an anchor.  The only time
that we can make a valid assumption about the relationship expressed by
an HTML anchor is when the rel="" attribute is used correctly.
Likewise, there is nothing (aside from syntax issues) that prevents
less ambiguous relationships to be expressed within the same
representation, within the protocol stream, or within other
representations on the Web (like RDF).

> The only way I can see to square this circle is to differentiate
> between two kinds of interpretation. For example:
>
> $ wget http://www.w3.org/People/Berners-Lee/card.n3#i
> ...
> HTTP request sent, awaiting response... 200 OK
>
> Web interpretation: this is somehow related to Tim
> Semantic Web interpretation: we got a 200, so this is about Tim - what
> does the RDF here say?
>
> If we'd got a 303, sure, we could follow the httpRange-14 resolution's
> interpretation. But I don't think we can realistically assume
> 200=Information Resource. I don't think this problem can be completely
> resolved with a technical trick at the HTTP layer.

Nobody *needs* to assume 200=Information Resource.  That is another
completely artificial case for the sake of useless argumentation.
The 303 solution exists for people who do not want to imply that
their named resource can be represented.  That's all it means for
a GET to return 303 (ever since 1994, when the original meaning of
"redirect with new method" was deprecated due to lack of implementation
and security issues and replaced with "redirect to see other resource").
Knowing the nature of the resource is irrelevant. The use case for the
303 recommendation was to avoid contradiction, not to avoid ambiguity.

....Roy
Received on Monday, 25 February 2008 19:49:19 UTC