Re: TAG HTTP range - Issue 14 from Roy T. Fielding on 2003-07-04 (www-archive@w3.org from July 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Fri, 4 Jul 2003 20:49:07 +0200
To: Tim Berners-Lee <timbl@w3.org>
Cc: Public W3C <www-archive@w3.org>
Message-Id: <3103ACFC-AE50-11D7-B57A-000393753936@apache.org>
Hi Tim,

I am still in Switzerland (through July 17), but I'll try to keep up.
BTW, BBC World has been showing bits of an interview with you on TV,
with you sitting on a weird sculpture.

> So, I I'd like to work with you on this issue.  I have two routes.
>
> Route 1.
>
> I think we had got to the point to the point that, while you 
> maintained that HTTP URIs can identify anything, you did admit that 
> the web in reality depends on the expectation that if I have seen one 
> bit of information with a given URI, I expect to be able to quote the 
> URI and for someone else to later get back for it essentially the same 
> information, for whatever local definition of "essentially".  In other 
> words, if I show you a refer you to a picture of a car and you get a 
> parts list that is stretching it, it probably hinders communication, 
> whereas if i see a PNG and you get a JPEG we probably never even know.

I don't think we ever disagreed on that -- it is even in the stuff I
wrote four years ago.

> So one route is to point out that because of that expectation, what is 
> really invariant about the representations retrieved for the same URI 
> is not that they are about the same thing, but that they convey the 
> same information.  It might be a living page like "the front page of 
> the LATimes" or it might be "what my car looks like".

Yes, the resource is the sameness over time, regardless of the format.

> So in the architecture we actually lose information unless we capture 
> that.

I thought we did capture that.

>  So, please, can we refer to tthe things identified by HTTP URIs as 
> say "information resources" (I won't quibble over terms).   This will 
> enormously help clear up the indirection bugs we get on many of the 
> lists.

I don't think so.  You are jumping from "resources are consistent"
to "http means document", which is not a logical conclusion.  Let's say
that I have an http URI for which no information is forthcoming in
response to a GET -- never is and never will be as far as the consumer
can see.  Is that an identifier of an information resource?  I don't 
know.
In fact, nobody except the naming authority knows.  Maybe it is a future
resource that just doesn't have a representation yet.  Maybe it is a 
sink.
We really can't know for sure until someone else tells us what we can do
with that URI.  What we do know is that they can never be inconsistent,
since inconsistency would change the meaning and thus the resource.
In other words, there is no case in which a URI that actually
identifies a car will ever return a parts list, so we don't have
to worry about it.

Another way of thinking about this is to go from the other direction.
Let's say that we create a new identifier called "urn:bug:foo" and
declare that it corresponds to a particular species of bug -- not
information about the bug, but the concept of that species in much
the same way that people use "human" to describe us.  It follows
therefore that we should not be confusing that identifier with
information about the bug species.  But that simply will not last
for long -- it is a URI and thus transferable via information protocols
like HTTP, and eventually someone will come along and deploy a urn
proxy that takes as input a "urn:bug:foo" and responds with some
sort of information.  Does the presence of that information invalidate
the "realness" of the original concept?  No.  No more so than the
presence of an HTTP server invalidates the "realness" of the resources
named by that server's authority.

What can be said about an "http" URI is that it identifies a resource
by reference to an HTTP interface address.  Whether or not the resource
has some nature beyond the HTTP interface is not stated, nor does it
need to be constrained.  We know that the HTTP interface is constrained
to an information exchange, but that says nothing about the nature of
the resource.  The architecture is complete and well-founded if all
we say is that "http" identifiers identify resources; there is no need
(and no benefit gained) by further restricting what can be identified
by an http URI.

In any case, saying that "http" identifies an information resource
would not eliminate the indirection issue.  A document can talk about
some other document just as easily as a car.  We eliminate the 
indirection
case by declaring that assertions that target a URI are assertions on
the resource identified by that URI: state that is only reflected by the
content of all its representation over all time.  The only way to make
assertions about the information content returned by an action is to
add qualifiers for method and time, since the architecture requires
that those be orthogonal to the identifier.

> Route 2.
>
> This is to say, can we please have this distinction for the semantic 
> web? In other words, before, it wasn't necessary to formally 
> distinguish between whether the Consortium of The Consortium's Home 
> Page was identified, as people constantly resolve such things in human 
> communication, and they were only used in human communication.
>
> Now we need to build a global KR system. You say your are not into 
> RDF, and that's OK, but you are quite smart enough to understand the 
> issues without using it or being committed to the language.     The 
> goal is a language which talks about arbitrary objects using 
> dereferencable global identifiers.

I know.  We keep going over the same issues.  I am not the person you
should be arguing with here -- this discussion was completely thrashed
out on the URI mailing list and it was Pat Hayes who clearly 
demonstrated
that this simply is not needed by RDF and is not true in any case.
People use URIs based on incomplete knowledge about the nature of
what is being referenced, thereby giving the URIs semantics through
use that may be absent from the minds of the naming authorities.
If we have dereference-able global identifiers then we have an
opportunity for secondary semantics to creep into the system.

For example, I provide a picture of Laguna Beach, but other people
link to it as a picture of the ocean. Nobody else is aware of the
distinction until I put up a picture of Laguna Beach that doesn't
happen to include a coastal scene.  Who introduced the error, and
who gets to decide which semantic is more significant?  What if I
decide, after receiving several million complaints from misled
users of that resource, that it really should be "a picture of
Laguna Beach that always shows the ocean"?  Have I changed the
resource or simply "fixed" an anomalous representation?

> Because we want to leverage the web, we obviously want to leverage 
> URIs - make those identifiers URIs in some ways.  There are two ways 
> of going about this.  One is to use the flexibility point in the 
> design of the fragid syntax.    This says that on the web, you can 
> make a new language about whatever  you like, and the fragid syntax 
> connects (in a way you define) with the syntax of the new language, 
> and the things identified mean whatever you like.  Example: SVG 
> defines graphics things, and the frag id syntax can define a 2d window 
> on the 2d space.  I actually think this is a really important 
> flexibility point in the design, as I would like all kinds of new 
> languages to introduce all kinds of new abstract concepts in the 
> future.

My opinion is that "#" is considerably less flexible than "http",
or "urn" for that matter.  It is a dead end for indirection, and
that is generally a bad thing for evolvability of the system.

> The only alternative is that we abandon the direct use of URIs for 
> arbitrary things, and follow alas common view among users that a 
> fragment identifier identifier identifies part of a document.

I think you are painting yourself into a corner.  The initial claim that
the presence of an "http" identifier implies that an inconsistency will
develop between the resource and information about the resource is 
false.
The resource is that which is consistent, which may actually be several
different aspects of sameness that is encountered when interacting
with that resource.  Additional metadata is needed to tell us what
aspects the authority considers essential to the resource, which can
be accomplished via RDF regardless of the URI scheme.  RDF does not
need the fragment distinction.

> So, Roy.  Could we see our way to resolving this between us so we can 
> then advance it in the TAG and clear the way to gettinga coordinated 
> architecture with the RDF core group?

I don't understand, Tim.  I have already resolved all of the issues
requested by the RDF core group.  In fact, they requested quite the
opposite to what you are saying is required for RDF, and vehemently
objected to the artificial semantics that Sandro wanted to add.
That is why I get so frustrated by this argument: if you can't agree
within RDF that this is necessary, then I don't see why the Web
has to be constrained in a way that is contrary to the services
model wherein robots and sinks and gateways have an equal place.

....Roy
Received on Friday, 4 July 2003 14:49:02 UTC