RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Dan Connolly on 2006-05-05 (public-swbp-wg@w3.org from May 2006)

From: Dan Connolly <connolly@w3.org>
Date: Fri, 05 May 2006 11:40:06 -0500
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: public-swbp-wg@w3.org
Message-Id: <1146847206.8804.91.camel@dirk.w3.org>
To be clear: In this thread, I'm trying to clarify
the TAG position and discuss which statements are consistent
with it and which are not.

The TAG's position is given by documents the TAG has
published, no more, and no less. In my IRW paper,
I took the liberty of representing them in N3/turtle/OWL,
and readers will have to use their own judgement on
whether I've done that faithfully. But after that,
we can use mathematical logic to evaluate the arguments.

I think I'll try the N3 approach some more in this message,
since I think it'll make my points more clear.

I'm tempted to switch to HTML and use color like I did in
my paper to distinguis tag positions from hypothetical
positions of various webmasters and such. But not just now...

To re-iterate, I'm attributing 2 RDF/turtle formulas to the TAG:

1.  w:representation rdfs:domain w:InformationResource.

which I derive from

  If an "http" resource responds to a GET request with a 2xx
  response, then the resource identified by that URI is an
  information resource.

  -- http://www.w3.org/2001/tag/issues.html?type=1#httpRange-14

and

2.  w:InformationResource owl:disjointFrom foaf:Person.

which I derive from...

"Other things, such as cars and dogs (and, if you've printed this
document on physical sheets of paper, the artifact that you are holding
in your hand), are resources too. They are not information resources,
however, because their essence is not information."
  --
http://www.w3.org/TR/2004/REC-webarch-20041215/#def-information-resource

And I take as background axioms the semantics of RDF, RDFS, and OWL,
and some stuff from the MIME and HTTP specs.

Now back to the discussion...

On Fri, 2006-05-05 at 00:11 -0400, Booth, David (HP Software - Boston)
wrote:
> Hi Dan,
> 
> > From: Dan Connolly [mailto:connolly@w3.org] 
> > On Thu, 2006-05-04 at 01:04 -0400, Booth, David (HP Software - Boston)
> > > > From: Dan Connolly [mailto:connolly@w3.org]
> > > > > From: David Booth
> > > > . . .
> > > > > Because "information resources" can return different
> > > > > "representations" 
> > > > > at different times (even if some happen to return the same 
> > > > > representation every time), it seems to me that "information 
> > > > > resources" are by their very nature abstract.
> > > > 
> > > > Please be careful with your quantifiers. Your argument seems to go
> > > > from:
> > > >    There are some information that have more than one
> > > >    representation and hence are abstract to
> > > >    All information resources have more than one representation.
> > > 
> > > Almost.  My argument goes from "Some information resources 
> > > have more 
> > > than one representation and hence are abstract" to "All information
> > > resources are abstract".   Here is the justification.  (For clarity,
> > > I'll avoid the term "abstract" below, and instead speak of 
> > > "functions from time to data", since that is more precise.)
> > > 
> > > 1. Given: A URI identifies a *single* resource.
> > > 
> > > 2. Any "information resource" that is intended to be time varying 
> > > (such as the "current weather report in Oaxaca") is obviously a 
> > > function from time to data, as illustrated above.  Thus, we 
> > > know that 
> > > some "information resources" are functions from time to data.
> > 
> > Actually, in the general case, they may be functions of more 
> > that just time: preferred media type, language, 
> > authentication credentials, even user agent, in some cases.
> 
> Yes, those are different inputs from the client.  I omitted that detail
> because it is not relevant to this discussion.  The time-varying nature
> of the "current weather report in Oaxaca" is independent of client
> input.

Very well.

Perhaps this part of the argument is orthogonal to the main
point about choosing URIs for wordnet words, but for fun,
I'm going to try to record it in N3 too.

In the Oaxaca example in webarch, we have:

  <http://weather.example.com/oaxaca> w:representation _:reportText1.

and it's pretty clear that the report changes form day to day:

  <http://weather.example.com/oaxaca>
    w:representation _:reportText1, _:reportText2.
  _:reportText1 owl:differentFrom _:reportText2.


I think we can agree that "A representation is data that encodes
information about resource state" means at least:

  w:Representation rdfs:subClassOf util:Data.

where...
  w:representation rdfs:range w:Representation.

and where, util: is, say...
  @prefix util: <http://example/vocab/util#>.

Now we can define "function from time to data" a la:

  util:FunctionFromTimeToData rdfs:subClassOf owl:FunctionalProperty,
      [ owl:onProperty rdfs:domain; owl:hasValue util:Time ],
      [ owl:onProperty rdfs:range; owl:hasValue util:Data ].

At this point, it's clearly *consistent* to say that some
information resources are functions from time to data, i.e.
  _:someRes a w:InformationResource, util:FunctionFromTimeToData.

but I do not think that it follows necessarily; i.e. it's
not a theorem that you can derive from the TAG's position.

(I'm tempted to try it with Euler... Euler is not complete,
of course... I'm using some stuff beyond OWL-DL... I wonder
how pellet would do... perhaps vampire or otter...
Hmm... I should be able to use Alloy to find a counter-example.
Bonus points to anybody who beats me to it.)



> > > 3. For other "information resources" that are plain Web pages, if 
> > > those Web pages ever change, then those "information resource" must 
> > > also be functions from time to data.
> > 
> > Well, they must have functions from time to data related to 
> > them. I don't see how you conclude that they are necessarily 
> > identical to those functions.
> 
> Are you suggesting that http://example.org/doc.html might identify one
> thing, d, which is not a function from time to data, but d has a
> function, fd, from time to data, associated with it, and fd determines
> what representation should be returned at what time?

Yes.

Formally: it's consistent to say that there's a function
from time to data that agrees with the way the example.org
web server behaves...

  :docFunc a util:FunctionFromTimeToData
  { ?MSG a http:OKResponse; http:about <http://example.org/doc.html>;
     util:time ?T; mime:body ?B1.
    ?T :docFunc ?B2.
  } => { ?B1 = ?B2 }.

and yet, this function is different from the document itself:

  <http://example.org/doc.html> owl:differentFrom :docFunc.

>   Unless fd were
> also used for some other purpose, I don't see the utility in
> distinguishing d from fd.  It seems to complicate the model.  What value
> does it add?

I'm not saying it adds value; I'm saying it's a coherent position
that is consistent with the TAG's position. I'm saying that you
cannot derive

 <http://example.org/doc.html> a util:FunctionFromTimeToData.

from the TAG's position.

 
> > > 4. The HTTP protocol and the URI resolution mechanism are such that 
> > > the content associated with a URI *always* has the *potential* of 
> > > changing. Thus, the content associated with a URI is *inherently* 
> > > changeable over time, even if by policy some Web pages are 
> > > intended to remain constant.
> > 
> > I don't agree.
> 
> Wow, I am really puzzled.  I don't understand how paragraph 4 above
> could be disputable.  If I register a domain, then for any URI under
> that domain, I can change the content that is served from that URI at
> any time.  Right? I don't understand what is disputable about that.

Let me try another counter-example. I just created a file
on my web server and did an HTTP round trip that demonstrates...

 _:reply1 a http:OKReponse;
   http:about <http://dm93.org/2006/05/05/abc> mime:body "abc";
   mime:content-type "text/plain".

hence we have
 <http://dm93.org/2006/05/05/abc> w:representation _:repr1.
 _:repr1 mime:body "abc";
   mime:content-type "text/plain".

Further, I claim that
  <http://dm93.org/2006geo/abc> owl:sameAs _:repr1.

i.e. not only does it have a representation that is "abc",
it is _identical_ to "abc".

This claim is (a) mine to make, as owner of dm93.org,
and (b) logically consistent with the position of the TAG.


> > If the IETF says http://www.ietf.org/rfc/rfc822.txt 
> > identifies a piece of text, and not a function from time to 
> > data, that's not just a statement of policy; we have 
> > delegated to them the right to say what the resource _is_. 
> 
> Well, not quite.  When IETF registered ietf.org, what we *really*
> delegated to them was the right to serve content from URIs under that
> domain.  You are proposing that we *also* interpret this delegation as
> giving them the right to authoritatively declare what "resource" is
> associated with each URI under their domain.  

I'm not proposing that; I'm reading it out of webarch:

"URI ownership gives the relevant social entity certain rights,
including:
     2. to associate a resource with an owned URI"
  -- 2.2.2. URI allocation
  http://www.w3.org/TR/2004/REC-webarch-20041215/#uri-assignment


> That's fine too (and I support that proposal), but the httpRange-14
> decision says if a URI dereference yields a 2xx status, then the URI's
> resource *should* be an "information resource".

(The TAG's position is not "should be" but "is"...)

>   So I think that gives
> the TAG a responsibility to be clear about what it means for something
> to be an "information resource", which is what I am trying to figure
> out.  

If you feel the TAG has not been sufficiently clear about what
is and what is not an Information Resource, I can sympathize.

In this thread, I'm _only_ trying to clarify the current position
of the TAG to date.


> > And I don't think they're contradicting any established norms 
> > when they say that it identifies one piece of text.
> 
> That depends on the definition of "information resource".  

The only thing that the TAG has said about "Information Resource"
so far is that it includes the subjects of HTTP 200 responses
and it doesn't include cars nor books.


> > > 5. I haven't a clue what utility there would be in calling 
> > > something 
> > > an "information resource" if that thing is never ever intended to 
> > > return some data in a 2xx response to an HTTP GET.
> > > 
> > > Therefore, by Occam's Razor I conclude:
> > > 
> > > 	All "information resources" are functions from time to data.
> > 
> > Occam's Razor isn't a valid logical inference. 
> 
> I'm not making a logical inference.  I'm proposing a *definition*.
> That's exactly what Occam's Razor is for: When two explanations both
> satisfy the observed phenomena, prefer the simpler one.  I'm proposing a
> simpler one.

Oh. I wasn't aware that swbp-wg was trying to define the term
"Information Resource".

I thought you were trying to choose URIs for wordnet words/synsets,
and somebody had argued against certain URIs and tried to use
the TAG's position as their justification.


> > It's sometimes 
> > appealing, but never compelling. In this case, I don't find 
> > it even appealing.  
> > 
> > The more relevant principle is that of minimal constraint. If 
> > a resource owner says their resource is a piece of data, then 
> > we should not constrain them otherwise unless we have really 
> > compelling reasons to do so.
> 
> That's fine, I certainly agree with that principle also.  I don't think
> my proposed definition is adding any additional constraints.  
> 
> Oh . . . wait.  Maybe I'm now understanding your concern with adopting a
> simpler definition of "information resource".  Are you saying that, even
> though a definition of "information resource" as "a function from time
> to data" may be simpler, adopting such a definition would prohibit URI
> owners  from claiming that their "information resources" are pieces of
> data?  Well, yes I guess it would be adding that constraint.  

Quite.

> Is that a problem?  Hmm.  It's hard for me to evaluate that since: (a) I
> don't have a clear enough understanding of your (or the TAG's)
> definition of "information resource"; and (b) I have not seen a lot of
> people claiming "this URI identfies both an information resource and a
> piece of data".  More on this below.
> 
> > > > . . . I think the IETF has made it pretty
> > > > clear that http://www.ietf.org/rfc/rfc822.txt has just
> > > > one representation. And they haven't done anything to
> > > > make the resource itself distinguishable from its
> > > > representation, so if they said the 2 are identical, that 
> > > > would be coherent.
> > > > 
> > > > Likewise, W3C has bound the URI
> > > > 
> > http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd
> > > > to a particular sequence of bytes/characters.
> > > > . . .
> > > > > In fact, it is not even possible on the Web to create a 
> > > > > URI that is
> > > > > permanently bound to a single document instance that can 
> > > > > never change:
> > > > 
> > > > I gave 2 counter-examples above.
> > > 
> > > No, you gave examples of URIs that are bound to content that, by 
> > > today's policy, is not *intended* to change.  The fact is, 
> > > the content *can* be changed, even intentionally, by the owners.
> > 
> > I gave 2 examples where the statement that the resource *is* 
> > a piece of data does not logically contradict any established norms.
> 
> Sure.  And the statement that "the resource is a function from time to
> data" also does not logically contradict any established norms.  So?

So I thought some members of the SemWeb BP WG were using the TAG's
position as justification for claims that some URIs are not appropriate
for identifying wordnet words/synsets.

WG members should feel free to argue any positions they like; but
please take care when attributing positions to the TAG that
they are clearly justified from materials published by the TAG.

 
> > Whether the resource _is_ a piece of data or is a 
> > time-varying abstraction is not something we can observe from 
> > HTTP interactions with the resource itself. 
> 
> True, not for those two examples, but for many examples (such as the
> Oaxaca weather report) we can.
> 
> > But in both 
> > cases, the resource owner has published information that 
> > strongly suggests that the resource _is_ a piece of data. 
> 
> Whoa!  Where?  Can you please point me to that "published information"?
> The only relevant evidence I have seen is that:
> 
> 	- An HTTP GET on the URI returns a 2xx status with some data;
> and
> 
> 	- The URI owner has stated that they will not change the content
> 	that is served.
> 
> and that evidence does *not* suggest that the resource is a piece of
> data any more than it suggests that the resource is a constant
> *function* from time to data.

Fair point. For reference, the "published information" for the IETF is

[[
The full
   text of the specification is then available using the following URL:

      http://www.ietf.org/rfc/rfcNNNN.txt

   where "NNNN" is the number of the RFC being submitted.
]]
 -- section 3.4.2 Submitting IETF Documents to JTC1
http://www.ietf.org/rfc/rfc3563.txt

and for the W3C HTML spec, it's

[[
The file DTD/xhtml1-strict.dtd is a normative part of this specification.
]]
 -- http://www.w3.org/TR/2002/REC-xhtml1-20020801/#dtds

I think those do _suggest_ that the URIs denote pieces of data;
i.e. it's a reasonable reading. My point was that it's also
a reading that is not logically contradicted by the TAG's position.

>   Furthermore: (a) we *know* that the
> content served from those URIs *could* in fact change if the URI owners
> ever decide to do so;

Then the URI owners would contradict themselves, which takes us
into another ballpark altogether than the one in which I want to
hold this discussion.

>  and (b) defining "information resource" as "a
> function from time to data" provides a simpler explanation for the
> observed evidence than defining an "information resource" as "either a
> function from time to data or a piece of data".

Yes, though we agreed above that in the general case, it would
have to be a function of other things too.

> Thus, it seems much more sensible to me to say that the resource is a
> function, from time to data, which for the foreseeable future is
> *likely* to be constant, but could in fact be non-constant.

Very well, I accept that as your position.

Please don't attribute it to the TAG, though.

> > Now 
> > they haven't published those actual logical assertions, but 
> > we can suppose that they did and explore the consequences. 
> > And I don't find any contradictions when I do that exploration.
> >  
> > > > > it is *always* possible to change the server configuration
> > > > > or domain 
> > > > > IP mapping to cause a different document instance to be served.
> > > > 
> > > > That would be a bug, in the 2 cases above.
> > > 
> > > What I meant was, if the domain owners' policies change, then the 
> > > documents may be changed *intentionally*.  That's a feature, not a 
> > > bug.
> > > 
> > > > 
> [[
> > > > > In other words, an http URI on the real Web 
> > > > > identifies a logical *location* whose
> > > > > content *always* has the potential of changing.
> ]]
> > > > 
> > > > I don't agree.
> > > 
> > > I don't understand how this statement could be subject to dispute.  
> > > Can you explain?
> > 
> > I explained by example above.
> 
> Are we in the same universe?  Help me out here.  The statement in
> [[...]] above is a simple restatement of how HTTP works.  I am at a loss
> to understand why you disagree with it.

As discussed above, the URI owner has the right to associate
a resource that is not even _potentially_ changing with a URI that
they own.

> 
> David Booth
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Friday, 5 May 2006 16:40:21 UTC