RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Dan Connolly on 2006-05-03 (public-swbp-wg@w3.org from May 2006)

From: Dan Connolly <connolly@w3.org>
Date: Wed, 03 May 2006 09:45:00 -0500
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: public-swbp-wg@w3.org
Message-Id: <1146667500.27608.1366.camel@dirk.w3.org>
On Fri, 2006-04-28 at 23:44 -0400, Booth, David (HP Software - Boston)
wrote:
> > From:  Dan Connolly
> > . . .
> > Pat Hayes wrote:
> > > My current
> > > understanding is that an information resource is some thing 
> > > that can 
> > > be transmitted over a network by a transfer protocol. On this 
> > > understanding, one could argue that a word was an information 
> > > resource.
> > 
> > On Thu, 20 Apr 2006 17:40:20 -0400 Booth, David wrote: 
> > > It sounds like you are mainly disagreeing with the TAG's guidance.
> > 
> > For what it's worth, I think Pat's position is consistent 
> > with the TAG's position (i.e. the W3C's position, since 
> > webarch is now a W3C Recommendation).
> 
> I'm surprised and baffled, since I thought Pat argued that it is okay
> for a URI to be used both as a name for a person and a name for a
> document that describes that person.  But I guess you're referring to
> this one point about a word being an information resource.

Yes, just the one point.

> > . . . The definition of "Information Resource" that W3C 
> > endorses[10] is:
> > . . .
> >
> http://www.w3.org/TR/2004/REC-webarch-20041215/#def-information-resource
> >
> > I don't think that means that words are not information resources.
> 
> I think it may depend on what you mean by "words".  

I don't think so. I don't think there's any (reasonable) meaning
of "words" where the TAG has decided that w:InformationResource
has no intersection with it.

> If http://example.org/doc.html identifies a single resource, and the
> associated document is updated to correct typos, then clearly
> http://example.org/doc.html is identifying more than just the words that
> are *currently* served from that URI: it is identifying a document
> *abstraction*, rather than a particular document instance or a
> particular set of words.  I don't see how "all of [the] essential
> characteristics"[10] of that document *abstraction* can be "conveyed in
> a message"[10].

No? It seems to me that we do that pretty routinely.

In any case, I don't see the relevance of that example to the
question of whether w:InformationResource intersects wordnet:Synset.

A more relevant example is something like

http://sigma.ontologyportal.org:4010/sigma/WordNet.jsp?word=frog&POS=1

If Adam Pease says that URI refers to the word "frog", I don't
see how that conflicts with anything the TAG has written. Adam
may correct typos in his representation(s) of the word frog.

(This is not to say that I think it would be wise; as I wrote
in my paper, "I suggest adopting w:InformationResource rdfs:subClassOf
frbr:Work as a practical constraint." and I don't think
wordnet synsets are frbr:Works. But maybe it's coherent to say
that some are. Hmm. anyway... I'm not giving advice here; just
trying to clarify the position of the TAG. ).



> Similarly, if http://weather.example.com/oaxaca identifies a single
> resource that is "a periodically updated report on the weather in
> Oaxaca"[10], then I don't see how "all of [the] essential
> characteristics"[10] of that periodically updated report can be
> "conveyed in a message"[10].

Again, it seems to me that we do this routinely. Maybe it takes
more than one message and webarch is a bit sloppy here. In any case...


> Because "information resources" can return different "representations"
> at different times (even if some happen to return the same
> representation every time), it seems to me that "information resources"
> are by their very nature abstract.  

Please be careful with your quantifiers. Your argument seems to go
from:
   There are some information that have more than one
   representation and hence are abstract
to
   All information resources have more than one representation.

On the contrary, I think the IETF has made it pretty
clear that http://www.ietf.org/rfc/rfc822.txt has just
one representation. And they haven't done anything to
make the resource itself distinguishable from its
representation, so if they said the 2 are identical,
that would be coherent.

Likewise, W3C has bound the URI
  http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd
to a particular sequence of bytes/characters.


> Clearly the notion of an "information resource" is modeled after the
> real life notion of the contents of a (logical) disk region, on a Web
> server, that is associated with a URI "racine".  (The "racine" is all of
> the URI except the fragment identifier.[11])  The server is configured
> to return those contents, whatever they are, when the URI racine is
> dereferenced.  And those contents may change over time!  Thus, the URI
> racine is not identifying any *particular* contents, it is identifying
> the logical *location* where those contents are stored, and the server
> provides whatever contents happen to be stored there at the moment they
> are requested.  

Yes, but W3C and the IETF promise that some parts of our disks
won't change.

> In fact, it is not even possible on the Web to create a URI that is
> permanently bound to a single document instance that can never change:

I gave 2 counter-examples above.

> it is *always* possible to change the server configuration or domain IP
> mapping to cause a different document instance to be served.

That would be a bug, in the 2 cases above.

>   In other
> words, an http URI on the real Web identifies a logical *location* whose
> content *always* has the potential of changing.

I don't agree.

>   Similarly (I argue), an
> "information resource" is *necessarily* abstract.  Thus, if something is
> not abstract, then it cannot be an "information resource".

I don't find this argument convincing.

> So returning to your comment about whether a word could be an
> "information resource", it depends on what you mean by "word".  If an
> alternate spelling of "color" is "colour", then we are referring to an
> abstract notion of a word, whose spelling may vary.  However, if you are
> referring to particular sequence of characters that can be transmitted
> over the network, that is a *concrete* notion of "word", and thus cannot
> be an "information resource".
> 
> > 
> > I tried to cover this in a recent submission to IRW2006...
> > 
> > [[
> > Note that the TAG has not taken a position on whether
> >  w:InformationResource intersects with rdf:Property. ]]
> >  -- "An analysis of httpRange-14" section  
> > http://www.w3.org/2006/04/irw65/urisym#hr14
> 
> Great paper!
> 
> [8] TAG httpRange-14 decision:
> http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
> 
> [9] Tim Bray's proposed definition of "information resource":
> http://lists.w3.org/Archives/Public/www-tag/2003Jul/0377.html
> 
> [10] WebArch definition of "information resource":
> http://www.w3.org/TR/2004/REC-webarch-20041215/#def-information-resource
> 
> [11] Definition of "racine":
> http://www.w3.org/2000/10/swap/log#racine
> 
> David Booth
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Wednesday, 3 May 2006 14:45:13 UTC