Re: [Fwd: RE: "information resource"] from Roy T. Fielding on 2004-10-18 (www-tag@w3.org from October 2004)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Sun, 17 Oct 2004 19:03:01 -0700
To: Sandro Hawke <sandro@w3.org>
Cc: W3C TAG <www-tag@w3.org>
Message-Id: <D7114846-20A9-11D9-8A83-000393753936@gbiv.com>
On Oct 17, 2004, at 11:55 AM, Sandro Hawke wrote:
>> No, the problem is that they are exactly the same issues and folks
>> just assume they are different because they don't understand the
>> actual issues faced by current Web implementations of resources and
>> how those issues impact what Web clients can assume about resources.
>> Make Web statements about things on the Web and you have the same
>> problems (and the same solutions) as those things off the Web.
>
> It seems to me the problem is this: with RDF it becomes useful to
> assign URIs to things like dogs and movies (even ones which are not
> available for download).  When people do that, we easily get
> unintended URI collisions,

Again, you just completely missed my point.  When people make statements
about resources on the Web, whether or not they are information 
resources,
they have a hard time distinguishing among representations, short-term
characteristics of a resource, and actual resource characteristics.
People constantly say things about Web resources that are only true in
a very limited sense, such as when it is viewed with MSIE.  That is 
life.

> as discussed in the current draft:
>
>       Suppose, for example, that one organization makes use of a URI
>       to refer to the movie "The Sting", and another organization uses
>       the same URI to refer to a discussion forum about "The Sting."
>       This collision creates confusion about what the URI identifies,
>       undermining the value of the URI. If one wanted to talk about
>       the creation date of the resource identified by the URI, for
>       instance, it would not be clear whether this meant "when the
>       movie created" or "when the discussion forum about the movie was
>       created."
>           - 
> http://www.w3.org/TR/2004/WD-webarch-20040816/#URI-collision

Right, just as the same unclear statements can be made about an
information resource.  A person could be talking about the creation
date of the website (when it was first made available), the creation
date of the content (when it was first authored), the creation date
of this format of the content (when it was first placed in HTML
form), or the creation date of the representation received by the
user making the statement.  That is because there is a lot more to
the use of URIs than merely what they identify.

The *only* solution to this problem is to clarify the assertion
being made.  Classifying resources as one type or another solves
nothing because resources of the same type are the most likely to
be subject to ambiguously targeted assertions.

> Some people seem to find the notion of "information resources" helps
> them avoid this kind of modeling error, or detect when other people do
> it.  In fact, an OWL reasoner will often be able to report an error
> when someone accidentally uses the URI of a movie when they meant the
> URI of a discussion forum -- as long as there is a sufficiently
> detailed ontology involved.  In "the running_time of X is ...", if the
> declared domain of running_time is movies, and someone uses the
> discussion forum URI instead, software can detect that.

OWL can do that regardless of the name given to the category of
the resource, particularly since OWL has no way of determining what
resources are essentially information and what are not.  It is the
definition of "running_time" that makes that possible, not the type of 
X.

> A fairly simple ontology which might help a lot is to divide the world
> into things which are and are not information resources.   The World
> Wide Web Consortium is not an information resource, but what
> http://www.w3.org/ idenfities is, ....  so it becomes practical to
> detect and report the error of someone using that URI to (directly)
> identify that organization.    Some of us think that an HTTP "200 OK"
> response on a GET or HEAD for some URI means it identifies an
> information resource, so this process becomes even easier:  if someone
> writes that they work for http://www.w3.org/, the type-error can be
> detected by only know about "works for" -- nothing needs to have been
> said about "http://www.w3.org/".

Why?  Why is such a type error even relevant?  Why do we need to 
discover
it differently than all of the other mistargeted assertions?  The 
problem
I have with "information resources" isn't that they aren't a relevant 
type
of resource, but rather that they are being used as a scapegoat for not
solving the real problems of the semantic web.  Trying to introduce 
false
constraints into the existing Web architecture just to satisfy a small
subset of a special case of ambiguously targeted assertions is wrong.

Dublin Core knew about this problem at the very first workshop.
Rather than trying to classify what a resource is, DCMI simply
refers to the definition in RFC 2396 and says that DC statements
are for information resources [1].  In other words, the terms are
specifically targeted, and thus if someone says "dc:title" we
know that it is a title of an information resource because that
is the definition of that term in DCMI.

[1]   http://www.niso.org/standards/resources/Z39-85.pdf

DCMI did that not because they needed to differentiate between documents
and dogs, but rather because they needed to limit the scope of 
discussion
to a reasonably understood set of resources.  Web Architecture doesn't
care.  The semantic web isn't going to care either -- the assertions 
being
made are either going to be well-defined (and thus testable) or poorly
defined and unlikely to be useful.  In either case, it isn't the types
of resources that matter: what matters is how well the predicates are
defined to distinguish what is being said about the object URI.

....Roy
Received on Monday, 18 October 2004 02:03:36 UTC