RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues]

Hi Dan,

> From: Dan Connolly [mailto:connolly@w3.org] 
> On Thu, 2006-05-04 at 01:04 -0400, Booth, David (HP Software - Boston)
> > > From: Dan Connolly [mailto:connolly@w3.org]
> > > > From: David Booth
> > > . . .
> > > > Because "information resources" can return different
> > > > "representations" 
> > > > at different times (even if some happen to return the same 
> > > > representation every time), it seems to me that "information 
> > > > resources" are by their very nature abstract.
> > > 
> > > Please be careful with your quantifiers. Your argument seems to go
> > > from:
> > >    There are some information that have more than one
> > >    representation and hence are abstract to
> > >    All information resources have more than one representation.
> > 
> > Almost.  My argument goes from "Some information resources 
> > have more 
> > than one representation and hence are abstract" to "All information
> > resources are abstract".   Here is the justification.  (For clarity,
> > I'll avoid the term "abstract" below, and instead speak of 
> > "functions from time to data", since that is more precise.)
> > 
> > 1. Given: A URI identifies a *single* resource.
> > 
> > 2. Any "information resource" that is intended to be time varying 
> > (such as the "current weather report in Oaxaca") is obviously a 
> > function from time to data, as illustrated above.  Thus, we 
> > know that 
> > some "information resources" are functions from time to data.
> 
> Actually, in the general case, they may be functions of more 
> that just time: preferred media type, language, 
> authentication credentials, even user agent, in some cases.

Yes, those are different inputs from the client.  I omitted that detail
because it is not relevant to this discussion.  The time-varying nature
of the "current weather report in Oaxaca" is independent of client
input.

> 
> > 3. For other "information resources" that are plain Web pages, if 
> > those Web pages ever change, then those "information resource" must 
> > also be functions from time to data.
> 
> Well, they must have functions from time to data related to 
> them. I don't see how you conclude that they are necessarily 
> identical to those functions.

Are you suggesting that http://example.org/doc.html might identify one
thing, d, which is not a function from time to data, but d has a
function, fd, from time to data, associated with it, and fd determines
what representation should be returned at what time?  Unless fd were
also used for some other purpose, I don't see the utility in
distinguishing d from fd.  It seems to complicate the model.  What value
does it add?

> 
> > 4. The HTTP protocol and the URI resolution mechanism are such that 
> > the content associated with a URI *always* has the *potential* of 
> > changing. Thus, the content associated with a URI is *inherently* 
> > changeable over time, even if by policy some Web pages are 
> > intended to remain constant.
> 
> I don't agree.

Wow, I am really puzzled.  I don't understand how paragraph 4 above
could be disputable.  If I register a domain, then for any URI under
that domain, I can change the content that is served from that URI at
any time.  Right?  I don't understand what is disputable about that.

> 
> If the IETF says http://www.ietf.org/rfc/rfc822.txt 
> identifies a piece of text, and not a function from time to 
> data, that's not just a statement of policy; we have 
> delegated to them the right to say what the resource _is_. 

Well, not quite.  When IETF registered ietf.org, what we *really*
delegated to them was the right to serve content from URIs under that
domain.  You are proposing that we *also* interpret this delegation as
giving them the right to authoritatively declare what "resource" is
associated with each URI under their domain.  

That's fine too (and I support that proposal), but the httpRange-14
decision says if a URI dereference yields a 2xx status, then the URI's
resource *should* be an "information resource".  So I think that gives
the TAG a responsibility to be clear about what it means for something
to be an "information resource", which is what I am trying to figure
out.  

> And I don't think they're contradicting any established norms 
> when they say that it identifies one piece of text.

That depends on the definition of "information resource".  

> 
> > 5. I haven't a clue what utility there would be in calling 
> > something 
> > an "information resource" if that thing is never ever intended to 
> > return some data in a 2xx response to an HTTP GET.
> > 
> > Therefore, by Occam's Razor I conclude:
> > 
> > 	All "information resources" are functions from time to data.
> 
> Occam's Razor isn't a valid logical inference. 

I'm not making a logical inference.  I'm proposing a *definition*.
That's exactly what Occam's Razor is for: When two explanations both
satisfy the observed phenomena, prefer the simpler one.  I'm proposing a
simpler one.

> It's sometimes 
> appealing, but never compelling. In this case, I don't find 
> it even appealing.  
> 
> The more relevant principle is that of minimal constraint. If 
> a resource owner says their resource is a piece of data, then 
> we should not constrain them otherwise unless we have really 
> compelling reasons to do so.

That's fine, I certainly agree with that principle also.  I don't think
my proposed definition is adding any additional constraints.  

Oh . . . wait.  Maybe I'm now understanding your concern with adopting a
simpler definition of "information resource".  Are you saying that, even
though a definition of "information resource" as "a function from time
to data" may be simpler, adopting such a definition would prohibit URI
owners  from claiming that their "information resources" are pieces of
data?  Well, yes I guess it would be adding that constraint.  

Is that a problem?  Hmm.  It's hard for me to evaluate that since: (a) I
don't have a clear enough understanding of your (or the TAG's)
definition of "information resource"; and (b) I have not seen a lot of
people claiming "this URI identfies both an information resource and a
piece of data".  More on this below.

> > > . . . I think the IETF has made it pretty
> > > clear that http://www.ietf.org/rfc/rfc822.txt has just
> > > one representation. And they haven't done anything to
> > > make the resource itself distinguishable from its
> > > representation, so if they said the 2 are identical, that 
> > > would be coherent.
> > > 
> > > Likewise, W3C has bound the URI
> > > 
> http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd
> > > to a particular sequence of bytes/characters.
> > > . . .
> > > > In fact, it is not even possible on the Web to create a 
> > > > URI that is
> > > > permanently bound to a single document instance that can 
> > > > never change:
> > > 
> > > I gave 2 counter-examples above.
> > 
> > No, you gave examples of URIs that are bound to content that, by 
> > today's policy, is not *intended* to change.  The fact is, 
> > the content *can* be changed, even intentionally, by the owners.
> 
> I gave 2 examples where the statement that the resource *is* 
> a piece of data does not logically contradict any established norms.

Sure.  And the statement that "the resource is a function from time to
data" also does not logically contradict any established norms.  So?

> 
> Whether the resource _is_ a piece of data or is a 
> time-varying abstraction is not something we can observe from 
> HTTP interactions with the resource itself. 

True, not for those two examples, but for many examples (such as the
Oaxaca weather report) we can.

> But in both 
> cases, the resource owner has published information that 
> strongly suggests that the resource _is_ a piece of data. 

Whoa!  Where?  Can you please point me to that "published information"?
The only relevant evidence I have seen is that:

	- An HTTP GET on the URI returns a 2xx status with some data;
and

	- The URI owner has stated that they will not change the content
	that is served.

and that evidence does *not* suggest that the resource is a piece of
data any more than it suggests that the resource is a constant
*function* from time to data.  Furthermore: (a) we *know* that the
content served from those URIs *could* in fact change if the URI owners
ever decide to do so; and (b) defining "information resource" as "a
function from time to data" provides a simpler explanation for the
observed evidence than defining an "information resource" as "either a
function from time to data or a piece of data".

Thus, it seems much more sensible to me to say that the resource is a
function, from time to data, which for the foreseeable future is
*likely* to be constant, but could in fact be non-constant.

> Now 
> they haven't published those actual logical assertions, but 
> we can suppose that they did and explore the consequences. 
> And I don't find any contradictions when I do that exploration.
>  
> > > > it is *always* possible to change the server configuration
> > > > or domain 
> > > > IP mapping to cause a different document instance to be served.
> > > 
> > > That would be a bug, in the 2 cases above.
> > 
> > What I meant was, if the domain owners' policies change, then the 
> > documents may be changed *intentionally*.  That's a feature, not a 
> > bug.
> > 
> > > 
[[
> > > > In other words, an http URI on the real Web 
> > > > identifies a logical *location* whose
> > > > content *always* has the potential of changing.
]]
> > > 
> > > I don't agree.
> > 
> > I don't understand how this statement could be subject to dispute.  
> > Can you explain?
> 
> I explained by example above.

Are we in the same universe?  Help me out here.  The statement in
[[...]] above is a simple restatement of how HTTP works.  I am at a loss
to understand why you disagree with it.

David Booth

Received on Friday, 5 May 2006 04:12:19 UTC