RE: on documents and terms [was: RE: [WNET] new proposal WN URIsand related issues] from Booth, David (HP Software - Boston) on 2006-05-12 (public-swbp-wg@w3.org from May 2006)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Fri, 12 May 2006 01:54:33 -0400
To: "Dan Connolly" <connolly@w3.org>
Cc: <public-swbp-wg@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C20B935F@tayexc19.americas.cpqcorp.net>
Dan,

Thanks for the very helpful explanations, and sorry it's taken a while
to respond.  

When the WebArch is unclear (or even conflicting), readers have no
choice but to guess the TAG's intent. You've given one interpretation,
which sounds like it boils down to:

	- Any resource r is an "information resource" if:

		a. there exists a URI that returns a 
		2xx status when dereferenced; and

		b. r is not a dog, person or physical book.

		c. the owner of that URI claims that the
		the URI identifies r; and

I've offered another interpretation:

	- Anything that returns a 2xx status *is* an 
	"information resource"

	- An "information resource" is a function from time to data
	and nothing else (ignoring orthogonal details about content
	negotiation, cookies, etc.).

I think we have established that both of these interpretations are
consistent with the TAG's writings (at least to the extent that the
TAG's writings themselves are consistent).

Your interpretation is conservative in the constraints it imposes on the
claims that URI owners can make; my interpretation is conservative in
the claims it encourages URI owners to make.  I'll explain further.

There is a spectrum of resource types. At one end of the spectrum are
some resources that we are certain are "information resources", i.e.,
the TAG's writings are clear about them.  At the other end we have
resources that we are certain are *not* "information resources".  In the
middle we have a no-man's land of resources whose status is ambiguous,
depending on one's interpretation of the TAG's writings:

	- resources that probably should be considered
	"information resources";

	- resources that may be considered "information resource"
	or not, at the owner's whim; and

	- resources that probably should not be considered
	"information resources".

It is quite conceivable that at some point in the future, the TAG will
issue more findings to help clarify the boundary between "information
resource" and non-"information resource". Therefore, to avoid running
afoul of any future TAG pronouncements about what is or is not an
"information resource", a *conservative* strategy for URI owners is to
only call those things "information resources" that we are *certain*
will not later be declared to be non-"information resources".  That is
one basis of the interpretation that I offered. There are two others:

	- It makes the boundary between "information resource"
	and non-"information resource" very clear.

	- It is limited to what is observable about an
	"information resource".

Much more below.

> From: Dan Connolly [mailto:connolly@w3.org]
>
> To be clear: In this thread, I'm trying to clarify the TAG position 
> and discuss which statements are consistent with it and which are not.
>
> The TAG's position is given by documents the TAG has published, no 
> more, and no less. In my IRW paper, I took the liberty of representing

> them in N3/turtle/OWL, and readers will have to use their own 
> judgement on whether I've done that faithfully. But after that, we can

> use mathematical logic to evaluate the arguments.
>
> I think I'll try the N3 approach some more in this message, since I 
> think it'll make my points more clear.
>
> I'm tempted to switch to HTML and use color like I did in my paper to 
> distinguis tag positions from hypothetical positions of various 
> webmasters and such. But not just now...
>
> To re-iterate, I'm attributing 2 RDF/turtle formulas to the
> TAG:
>
> 1. w:representation rdfs:domain w:InformationResource.
>
> which I derive from
>
>   If an "http" resource responds to a GET request with a 2xx
>   response, then the resource identified by that URI is an
>   information resource.
>
>   -- http://www.w3.org/2001/tag/issues.html?type=1#httpRange-14
>
> and
>
> 2. w:InformationResource owl:disjointFrom foaf:Person.
>
> which I derive from...
>
> "Other things, such as cars and dogs (and, if you've printed this 
> document on physical sheets of paper, the artifact that you are 
> holding in your hand), are resources too. They are not information 
> resources, however, because their essence is not information."
>   --
> http://www.w3.org/TR/2004/REC-webarch-20041215/#def-informatio
> n-resource
>
> And I take as background axioms the semantics of RDF, RDFS, and OWL, 
> and some stuff from the MIME and HTTP specs.

Fine.

>
> Now back to the discussion...
>
> On Fri, 2006-05-05 at 00:11 -0400, Booth, David (HP Software -
> Boston)
> wrote:
> > Hi Dan,
> >
> > > From: Dan Connolly [mailto:connolly@w3.org]
> > > On Thu, 2006-05-04 at 01:04 -0400, Booth, David (HP Software - 
> > > Boston)
> > > > > From: Dan Connolly [mailto:connolly@w3.org]
> > > > > > From: David Booth
> > > > > . . .
> > > > > > Because "information resources" can return different 
> > > > > > "representations" at different times (even if some happen to

> > > > > > return the same representation every time), it seems to me 
> > > > > > that "information resources" are by their very nature 
> > > > > > abstract.
> > > > >
> > > > > Please be careful with your quantifiers. Your argument seems 
> > > > > to go from:
> > > > >    There are some information that have more than one
> > > > >    representation and hence are abstract to All
> > > > >    information resources have more than one
> > > > >    representation.
> > > >
> > > > Almost. My argument goes from "Some information resources have 
> > > > more than one representation and hence are abstract" to "All 
> > > > information resources are abstract". Here is the justification. 
> > > > (For clarity, I'll avoid the term "abstract" below, and instead 
> > > > speak of "functions from time to data", since that is more 
> > > > precise.)
> > > >
> > > > 1. Given: A URI identifies a *single* resource.
> > > >
> > > > 2. Any "information resource" that is intended to be time 
> > > > varying (such as the "current weather report in Oaxaca") is 
> > > > obviously a function from time to data, as illustrated above. 
> > > > Thus, we know that some "information resources" are functions 
> > > > from time to data.
> > >
> > > Actually, in the general case, they may be functions of more that 
> > > just time: preferred media type, language, authentication 
> > > credentials, even user agent, in some cases.
> >
> > Yes, those are different inputs from the client. I omitted that 
> > detail because it is not relevant to this discussion. The 
> > time-varying nature of the "current weather report in Oaxaca" is 
> > independent of client input.
>
> Very well.
>
> Perhaps this part of the argument is orthogonal to the main point 
> about choosing URIs for wordnet words, . . . .
> . . .
> At this point, it's clearly *consistent* to say that some information 
> resources are functions from time to data, i.e.
>   _:someRes a w:InformationResource, util:FunctionFromTimeToData.
>
> but I do not think that it follows necessarily; i.e. it's not a 
> theorem that you can derive from the TAG's position.

Correct. The TAG's position is unclear, thus I am guessing.

> . . .
> > > > 3. For other "information resources" that are plain Web pages, 
> > > > if those Web pages ever change, then those "information 
> > > > resource" must also be functions from time to data.
> > >
> > > Well, they must have functions from time to data related to them. 
> > > I don't see how you conclude that they are necessarily identical 
> > > to those functions.
> >
> > Are you suggesting that http://example.org/doc.html might identify 
> > one thing, d, which is not a function from time to data, but d has a

> > function, fd, from time to data, associated with it, and fd 
> > determines what representation should be returned at what time?
>
> Yes.
>
> Formally: it's consistent to say that there's a function
> from time to data that agrees with the way the example.org web server 
> behaves...
>
>   :docFunc a util:FunctionFromTimeToData
>   { ?MSG a http:OKResponse; http:about <http://example.org/doc.html>;
>      util:time ?T; mime:body ?B1.
>     ?T :docFunc ?B2.
>   } => { ?B1 = ?B2 }.
>
> and yet, this function is different from the document itself:
>
>   <http://example.org/doc.html> owl:differentFrom :docFunc.
>
> >   Unless fd were
> > also used for some other purpose, I don't see the utility in 
> > distinguishing d from fd. It seems to complicate the model. What 
> > value does it add?
>
> I'm not saying it adds value; I'm saying it's a coherent position that

> is consistent with the TAG's position.  I'm saying that you cannot 
> derive
>
>  <http://example.org/doc.html> a util:FunctionFromTimeToData.
>
> from the TAG's position.

The TAG's position is unclear.  I'm not logically deriving my
interpretation from the TAG's writings.  I'm trying to guess the TAG's
intent.  I doubt the TAG's intent would be a position that adds
complexity without adding value.

>
> > > > 4. The HTTP protocol and the URI resolution mechanism are such 
> > > > that the content associated with a URI *always* has the 
> > > > *potential* of changing. Thus, the content associated with a URI

> > > > is
> > > > *inherently*
> > > > changeable over time, even if by policy some Web pages are 
> > > > intended to remain constant.
> > >
> > > I don't agree.
> >
> > Wow, I am really puzzled. I don't understand how paragraph 4 above 
> > could be disputable. If I register a domain, then for any URI under 
> > that domain, I can change the content that is served from that URI 
> > at any time. Right? I don't understand what is disputable about 
> > that.
>
> Let me try another counter-example. I just created a file on my web 
> server and did an HTTP round trip that demonstrates...
>
>  _:reply1 a http:OKReponse;
>    http:about <http://dm93.org/2006/05/05/abc> mime:body "abc";
>    mime:content-type "text/plain".
>
> hence we have
>  <http://dm93.org/2006/05/05/abc> w:representation _:repr1. _:repr1 
> mime:body "abc";
>    mime:content-type "text/plain".
>
> Further, I claim that
>   <http://dm93.org/2006geo/abc> owl:sameAs _:repr1.
>
> i.e. not only does it have a representation that is "abc", it is 
> _identical_ to "abc".
>
> This claim is (a) mine to make, as owner of dm93.org, and (b) 
> logically consistent with the position of the TAG.

You seem to be implying that it is true by virtue of the fact that the
URI owner has claimed that it is true.  This interpretation would mean
that the TAG has stated two rules that are potentially in conflict:

	Rule1: A URI identifies whatever resource its owner claims it
	identifies. (Implied by WebArch sec 2.2.2.1: "URI ownership 
	gives the relevant social entity certain rights, including:
	. . . 2. to associate a resource with an owned URI-URI . . . ."
	-- http://www.w3.org/TR/webarch/#uri-assignment )

	Rule2: The httpRange-14 decision: "If an 'http' resource 
	responds to a GET request with a 2xx response, then 
	the resource identified by that URI is an information resource".
	-- http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html

When Mark Baker claims that http://markbaker.ca/ identifies himself, and
dereferencing that URI returns a 2xx status, is Mark Baker's claim true?


If so, then either:

	- the TAG's claim that people are not "information resources"
	is wrong; or

	- the TAG's httpRange-14 rule is faulty, because it leads us 
	to a false conclusion.

If not, then it seems to me that either:

	- http://markbaker.ca/ does not in fact identify an 
	"information resource"; or

	- http://markbaker.ca/ does not in fact identify Mark Baker 
	the person.

If we presume that the TAG's httpRange-14 rule is not faulty, and hence
that http://markbaker.ca/ does in fact identify an "information
resource", then it must not identify Mark Baker the person.  Therefore,
the fact that a URI owner *claims* that a URI identifies a particular
resource is *not* reliable evidence to conclude that the URI really
*does* identify that resource.  Thus, the fact that you *claim* that
http://dm93.org/2006geo/abc identifies a resource that is the string
"abc" does not mean that it actually does.

>
>
> > > If the IETF says http://www.ietf.org/rfc/rfc822.txt
> > > identifies a piece of text, and not a function from time to data, 
> > > that's not just a statement of policy; we have delegated to them 
> > > the right to say what the resource _is_.
> >
> > Well, not quite. When IETF registered ietf.org, what we
> > *really* delegated to them was the right to serve content from URIs 
> > under that domain. You are proposing that we *also* interpret this 
> > delegation as giving them the right to authoritatively declare what 
> > "resource" is associated with each URI under their domain.
>
> I'm not proposing that; I'm reading it out of webarch:
>
> "URI ownership gives the relevant social entity certain rights,
> including:
>      2. to associate a resource with an owned URI"
>   -- 2.2.2. URI allocation
>   http://www.w3.org/TR/2004/REC-webarch-20041215/#uri-assignment

Sorry, I should have said "The TAG is proposing" instead of "You are
proposing".  As I said, I *agree* with this view.  However, I don't
think this means that the owner's declaration is necessarily true.

>
>
> > That's fine too (and I support that proposal), but the httpRange-14 
> > decision says if a URI dereference yields a 2xx status, then the 
> > URI's resource *should* be an "information resource".
>
> (The TAG's position is not "should be" but "is"...)

The httpRange-14 decision is reported as being "advice to the
community"[16].  Since "advice" is normally something that one may
optionally follow, this sounds much more like a SHOULD than a MUST.

[16] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html

>
> >   So I think that gives
> > the TAG a responsibility to be clear about what it means for 
> > something to be an "information resource", which is what I am trying

> > to figure out.
>
> If you feel the TAG has not been sufficiently clear about what is and 
> what is not an Information Resource, I can sympathize.
>
> In this thread, I'm _only_ trying to clarify the current position of 
> the TAG to date.
>
>
> > > And I don't think they're contradicting any established norms when

> > > they say that it identifies one piece of text.
> >
> > That depends on the definition of "information resource".
>
> The only thing that the TAG has said about "Information Resource" so 
> far is that it includes the subjects of HTTP 200 responses and it 
> doesn't include cars nor books.
>
>
> > > > 5. I haven't a clue what utility there would be in calling 
> > > > something an "information resource" if that thing is never ever 
> > > > intended to return some data in a 2xx response to an HTTP GET.
> > > >
> > > > Therefore, by Occam's Razor I conclude:
> > > >
> > > > 	All "information resources" are functions from time to
data.
> > >
> > > Occam's Razor isn't a valid logical inference.
> >
> > I'm not making a logical inference. I'm proposing a *definition*. 
> > That's exactly what Occam's Razor is for: When two explanations both

> > satisfy the observed phenomena, prefer the simpler one. I'm 
> > proposing a simpler one.
>
> Oh. I wasn't aware that swbp-wg was trying to define the term 
> "Information Resource".

I think the working group is trying to Do The Right Thing in selecting
URIs and serving metadata from them. When the TAG's writings are
unclear, we have no choice but to guess the TAG's intent. That's what
I'm doing. However, my guesses are mine, they don't represent the
working group as a whole.

>
> I thought you were trying to choose URIs for wordnet words/synsets, 
> and somebody had argued against certain URIs and tried to use the 
> TAG's position as their justification.

Yes, though it was their *interpretation* of the TAG's position.  The
TAG's position is not clear enough to know exactly what it is.

>
> > > It's sometimes appealing, but never compelling. In this case, I 
> > > don't find it even appealing.
> > >
> > > The more relevant principle is that of minimal constraint. If a 
> > > resource owner says their resource is a piece of data, then we 
> > > should not constrain them otherwise unless we have really 
> > > compelling reasons to do so.
> >
> > That's fine, I certainly agree with that principle also. I don't 
> > think my proposed definition is adding any additional constraints.
> >
> > Oh . . . wait. Maybe I'm now understanding your concern with 
> > adopting a simpler definition of "information resource". Are you 
> > saying that, even though a definition of "information resource" as 
> > "a function from time to data" may be simpler, adopting such a 
> > definition would prohibit URI owners from claiming that their 
> > "information resources" are pieces of data? Well, yes I guess it 
> > would be adding that constraint.
>
> Quite.
>
> > Is that a problem? Hmm. It's hard for me to evaluate that
> > since: (a)
> > I don't have a clear enough understanding of your (or the
> > TAG's) definition of "information resource"; and (b) I have not seen

> > a lot of people claiming "this URI identfies both an information 
> > resource and a piece of data". More on this below.
> >
> > > > > . . . I think the IETF has made it pretty
> > > > > clear that http://www.ietf.org/rfc/rfc822.txt has just one 
> > > > > representation. And they haven't done anything to make the 
> > > > > resource itself distinguishable from its representation, so if

> > > > > they said the 2 are identical, that would be coherent.
> > > > >
> > > > > Likewise, W3C has bound the URI
> > > > >
> > >
> http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd
> > > > > to a particular sequence of bytes/characters.
> > > > > . . .
> > > > > > In fact, it is not even possible on the Web to create a URI 
> > > > > > that is permanently bound to a single document instance that

> > > > > > can never change:
> > > > >
> > > > > I gave 2 counter-examples above.
> > > >
> > > > No, you gave examples of URIs that are bound to content that, by

> > > > today's policy, is not *intended* to change. The fact is, the 
> > > > content *can* be changed, even intentionally, by the owners.
> > >
> > > I gave 2 examples where the statement that the resource
> > > *is* a piece of data does not logically contradict any established

> > > norms.
> >
> > Sure. And the statement that "the resource is a function from time 
> > to data" also does not logically contradict any established norms. 
> > So?
>
> So I thought some members of the SemWeb BP WG were using the TAG's 
> position as justification for claims that some URIs are not 
> appropriate for identifying wordnet words/synsets.

Yes, they were using their *interpretation* of the TAG's position, since
the TAG's position is unclear.

>
> WG members should feel free to argue any positions they like; but 
> please take care when attributing positions to the TAG that they are 
> clearly justified from materials published by the TAG.

I hope I have clearly justified my interpretations.  If you think I
haven't, please tell me what parts you think I have not justified. 

>
>
> > > Whether the resource _is_ a piece of data or is a time-varying 
> > > abstraction is not something we can observe from HTTP interactions

> > > with the resource itself.
> >
> > True, not for those two examples, but for many examples (such as the

> > Oaxaca weather report) we can.
> >
> > > But in both cases, the resource owner has published information 
> > > that strongly suggests that the resource _is_ a piece of data.
> >
> > Whoa! Where? Can you please point me to that "published 
> > information"? The only relevant evidence I have seen is that:
> >
> > 	- An HTTP GET on the URI returns a 2xx status with some data;
and
> >
> > 	- The URI owner has stated that they will not change the content
> > 	that is served.
> >
> > and that evidence does *not* suggest that the resource is a piece of

> > data any more than it suggests that the resource is a constant 
> > *function* from time to data.
>
> Fair point. For reference, the "published information" for the IETF is
>
> [[
> The full
>    text of the specification is then available using the
>    following
> URL:
>
>       http://www.ietf.org/rfc/rfcNNNN.txt
>
>    where "NNNN" is the number of the RFC being submitted.
> ]]
>  -- section 3.4.2 Submitting IETF Documents to JTC1 
> http://www.ietf.org/rfc/rfc3563.txt
>
> and for the W3C HTML spec, it's
>
> [[
> The file DTD/xhtml1-strict.dtd is a normative part of this 
> specification. ]]
>  -- http://www.w3.org/TR/2002/REC-xhtml1-20020801/#dtds
>
> I think those do _suggest_ that the URIs denote pieces of data; i.e. 
> it's a reasonable reading.

So on the one hand you have pointed to evidence that *suggests* that the
URI owners may be implicitly claiming that these URIs denote pieces of
data, and on the other hand I have pointed to evidence of the physical
reality of the Web implementation that strongly *suggests* that the
resources identified by these URIs are in fact functions from time to
data, in spite of what you think the owners may implicitly claim and in
spite of the fact that the data served is not currently time varying.

> My point was that it's also a
> reading that is not logically contradicted by the TAG's position.

I agree; nor is the interpretation that an "information resource" is a
time varying function.

>
> >   Furthermore: (a) we *know* that the
> > content served from those URIs *could* in fact change if the URI 
> > owners ever decide to do so;
>
> Then the URI owners would contradict themselves, which takes us into 
> another ballpark altogether than the one in which I want to hold this 
> discussion.

Okay, so in your interpretation, this would represent a contradiction in
the assertions made about what the URI identifies, whereas in my
interpretation, this would not be a contradiction about what the URI
identifies -- it would still identify the same function from time to
data -- it would represent a violation of their pledge not to modify the
document at that URI.

>
> >  and (b) defining "information resource" as "a
> > function from time to data" provides a simpler explanation for the 
> > observed evidence than defining an "information resource" as "either

> > a function from time to data or a piece of data".
>
> Yes, though we agreed above that in the general case, it would have to

> be a function of other things too.

Yes.

>
> > Thus, it seems much more sensible to me to say that the resource is 
> > a function, from time to data, which for the foreseeable future is
> > *likely* to be constant, but could in fact be non-constant.
>
> Very well, I accept that as your position.
>
> Please don't attribute it to the TAG, though.

I don't claim to be speaking on behalf of the TAG.  Do you?  I am simply
trying to interpret the WebArch.  My interpretation may be wrong.  I
admit that I am strongly biased toward interpretations that seem to make
sense to me -- interpretations that presume that the TAG is attempting
to add value, for example.  Thus far I'm not convinced that your
interpretation makes more sense than my interpretation, but I'm
listening and trying to figure it out.

>
> > > Now they haven't published those actual logical assertions, but we

> > > can suppose that they did and explore the consequences. And I 
> > > don't find any contradictions when I do that exploration.
> > >
> > > > > > it is *always* possible to change the server configuration 
> > > > > > or domain IP mapping to cause a different document instance 
> > > > > > to be served.
> > > > >
> > > > > That would be a bug, in the 2 cases above.
> > > >
> > > > What I meant was, if the domain owners' policies change, then 
> > > > the documents may be changed *intentionally*. That's a feature, 
> > > > not a bug.
> > > >
> > > > >
> > [[
> > > > > > In other words, an http URI on the real Web identifies a 
> > > > > > logical *location* whose content
> > > > > > *always* has the potential of changing.
> > ]]
> > > > >
> > > > > I don't agree.
> > > >
> > > > I don't understand how this statement could be subject to 
> > > > dispute. Can you explain?
> > >
> > > I explained by example above.
> >
> > Are we in the same universe? Help me out here. The statement in
> > [[...]] above is a simple restatement of how HTTP works.  I am at a
> > loss to understand why you disagree with it.
>
> As discussed above, the URI owner has the right to associate a 
> resource that is not even _potentially_ changing with a URI that they 
> own.

Yes, however the actual resource associated with the URI may not be the
resource that the URI owner claims it is.  In short, I am arguing for an
interpretation of the WebArch that gives precedence to observable facts
over claims.

David Booth
Received on Friday, 12 May 2006 05:55:28 UTC