W3C home > Mailing lists > Public > public-swbp-wg@w3.org > May 2006

RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues]

From: Dan Connolly <connolly@w3.org>
Date: Thu, 04 May 2006 08:05:53 -0500
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: public-swbp-wg@w3.org
Message-Id: <1146747954.29293.16.camel@dirk.w3.org>

On Thu, 2006-05-04 at 01:04 -0400, Booth, David (HP Software - Boston)
wrote:
> Hi Dan,
> 
> > From: Dan Connolly [mailto:connolly@w3.org] 
> > > From: David Booth
> > . . .
> > > Similarly, if http://weather.example.com/oaxaca identifies a single 
> > > resource that is "a periodically updated report on the weather in 
> > > Oaxaca"[10], then I don't see how "all of [the] essential 
> > > characteristics"[10] of that periodically updated report can be 
> > > "conveyed in a message"[10].
> > 
> > Again, it seems to me that we do this routinely. Maybe it 
> > takes more than one message and webarch is a bit sloppy here. 
> 
> But that is the crucial difference!  Sure, a *single* weather report can
> be conveyed in a message.   But http://weather.example.com/oaxaca is not
> merely identifying a *single* weather report issued at 2005-03-12
> 23:11:36.236 UTC or any other particular time.  It identifies a
> *function* from time to weather reports.  I don't know any way to
> transmit "all of [the] essential characteristics"[10] of that particular
> function in a message or even a finite set of messages.
> 
> > 
> > > Because "information resources" can return different 
> > > "representations" 
> > > at different times (even if some happen to return the same 
> > > representation every time), it seems to me that "information 
> > > resources" are by their very nature abstract.
> > 
> > Please be careful with your quantifiers. Your argument seems to go
> > from:
> >    There are some information that have more than one
> >    representation and hence are abstract to
> >    All information resources have more than one representation.
> 
> Almost.  My argument goes from "Some information resources have more
> than one representation and hence are abstract" to "All information
> resources are abstract".   Here is the justification.  (For clarity,
> I'll avoid the term "abstract" below, and instead speak of "functions
> from time to data", since that is more precise.)
> 
> 1. Given: A URI identifies a *single* resource.
> 
> 2. Any "information resource" that is intended to be time varying (such
> as the "current weather report in Oaxaca") is obviously a function from
> time to data, as illustrated above.  Thus, we know that some
> "information resources" are functions from time to data.

Actually, in the general case, they may be functions of more that
just time: preferred media type, language, authentication
credentials, even user agent, in some cases.

> 3. For other "information resources" that are plain Web pages, if those
> Web pages ever change, then those "information resource" must also be
> functions from time to data.

Well, they must have functions from time to data related to them.
I don't see how you conclude that they are necessarily identical
to those functions.

> 4. The HTTP protocol and the URI resolution mechanism are such that the
> content associated with a URI *always* has the *potential* of changing.
> Thus, the content associated with a URI is *inherently* changeable over
> time, even if by policy some Web pages are intended to remain constant.

I don't agree.

If the IETF says http://www.ietf.org/rfc/rfc822.txt identifies
a piece of text, and not a function from time to data, that's
not just a statement of policy; we have delegated to them the
right to say what the resource _is_. And I don't think they're
contradicting any established norms when they say that it
identifies one piece of text.

> 5. I haven't a clue what utility there would be in calling something an
> "information resource" if that thing is never ever intended to return
> some data in a 2xx response to an HTTP GET.
> 
> Therefore, by Occam's Razor I conclude:
> 
> 	All "information resources" are functions from time to data.

Occam's Razor isn't a valid logical inference. It's sometimes appealing,
but never compelling. In this case, I don't find it even appealing.

The more relevant principle is that of minimal constraint. If a
resource owner says their resource is a piece of data, then we
should not constrain them otherwise unless we have really compelling
reasons to do so.


> instead of:
> 
> 	Some information resources are functions from time to data,
> 	while others might merely be constant data.
> 
> > . . . I don't think there's any (reasonable) 
> > meaning of "words" where the TAG has decided that 
> > w:InformationResource has no intersection with it.
> 
> If "frog" is a word (i.e., those four letters in sequence), and you
> accept my conclusion above

I don't.

>  (that w:InformationResources are functions
> from time to data), then "frog" cannot be a w:InformationResource
> because it obviously is not a function from time to data.  It is merely
> data.
> 
> > 
> > On the contrary, I think the IETF has made it pretty
> > clear that http://www.ietf.org/rfc/rfc822.txt has just
> > one representation. And they haven't done anything to
> > make the resource itself distinguishable from its 
> > representation, so if they said the 2 are identical, that 
> > would be coherent.
> > 
> > Likewise, W3C has bound the URI
> >   http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd
> > to a particular sequence of bytes/characters.
> > 
> > 
> > > Clearly the notion of an "information resource" is modeled 
> > > after the 
> > > real life notion of the contents of a (logical) disk 
> > > region, on a Web 
> > > server, that is associated with a URI "racine".  (The 
> > > "racine" is all 
> > > of the URI except the fragment identifier.[11])  The server is 
> > > configured to return those contents, whatever they are, 
> > > when the URI 
> > > racine is dereferenced.  And those contents may change over time!  
> > > Thus, the URI racine is not identifying any *particular* 
> > > contents, it 
> > > is identifying the logical *location* where those contents 
> > > are stored, 
> > > and the server provides whatever contents happen to be 
> > > stored there at 
> > > the moment they are requested.
> > 
> > Yes, but W3C and the IETF promise that some parts of our 
> > disks won't change.
> > 
> > > In fact, it is not even possible on the Web to create a URI that is 
> > > permanently bound to a single document instance that can 
> > > never change:
> > 
> > I gave 2 counter-examples above.
> 
> No, you gave examples of URIs that are bound to content that, by today's
> policy, is not *intended* to change.  The fact is, the content *can* be
> changed, even intentionally, by the owners.

I gave 2 examples where the statement that the resource *is* a piece
of data does not logically contradict any established norms.

Whether the resource _is_ a piece of data or is a time-varying
abstraction is not something we can observe from HTTP interactions
with the resource itself. But in both cases, the resource owner
has published information that strongly suggests that the
resource _is_ a piece of data. Now they haven't published those
actual logical assertions, but we can suppose that they did
and explore the consequences. And I don't find any contradictions
when I do that exploration.
 
> > > it is *always* possible to change the server configuration 
> > > or domain 
> > > IP mapping to cause a different document instance to be served.
> > 
> > That would be a bug, in the 2 cases above.
> 
> What I mean was, if the domain owners' policies change, then the
> documents may be changed *intentionally*.  That's a feature, not a bug.
> 
> > 
> > >   In other
> > > words, an http URI on the real Web identifies a logical *location* 
> > > whose content *always* has the potential of changing.
> > 
> > I don't agree.
> 
> I don't understand how this statement could be subject to dispute.  Can
> you explain?

I explained by example above.

> David Booth
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Thursday, 4 May 2006 13:07:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:17:21 GMT