RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Pat Hayes on 2006-05-01 (public-swbp-wg@w3.org from May 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 1 May 2006 13:08:48 -0500
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: <public-swbp-wg@w3.org>, "Frank Manola" <fmanola@acm.org>
Message-Id: <p06230900c07bdbab253e@[10.100.0.24]>
>  > From: Pat Hayes [mailto:phayes@ihmc.us]
>>  >> From: David Booth
>>  . . .
>>  >>  3. The Web page itself: a document, consisting of 
>>  >> characters, which
>>  >> conform to XHTML syntactic  rules.
>>  >
>>  >If it conforms to XHTML syntactic rules it
>>  >sounds like you are talking about a particular
>>  >instance of a document rather than a document in
>>  >the abstract sense (which may change over time)
>>
>>  No, a document does not change over time, in
>>  either the abstract or concrete sense. To refer
>>  to documents changing over time is simply an
>>  ontological error.
>
>It's not an ontological error, it's a different definition of
>"document".

Well, I was trying to use the word in its normal 
sense, insofar as any of these words have normal 
senses. But OK, I concede that there is a usage 
of 'document' in which it can be treated as a 
continuant, i.e. something that has a lifetime 
and can change over that lifetime, while still 
being the 'same thing'. Editors drafts of W3C 
specs are a good example, I guess. I think this 
usage is unusual, unnecessary (since we already 
have a terminology of 'versions' for handling it) 
and potentially misleading, but lets agree to 
differ on that. However, I will note that with 
this notion of document, a lot of properties that 
are normally attributed to documents no longer 
apply. Documents in this sense are not made up of 
characters, cannot be parsed, etc.. This has a 
lot of knock-on consequences for other 
discussions. For example, an RDF graph cannot be 
a document in this sense, neither can most 
logical texts. It seems to me to be a lot less 
misleading to use 'document' for entities which 
have all the attributes of a document, and then 
be explicit about the more elaborate notions. So, 
we might call these things document resources or 
living documents, this being a function from 
times to documents, called its versions. Note 
that a function is not a document :-).

However, a document's being changeable in this 
sense has nothing to do with its being abstract. 
The static/labile distinction (I don't want to 
say 'dynamic', which makes it sound like a movie, 
I mean labile in the sense of changeable, having 
different versions at different times while still 
being identified as the 'same' document) is 
completely different from the concrete/abstract 
distinction. To confuse these dimensions really 
is an ontological error.

>  Your notion of "document" is static, which is fine, that's
>one meaning of the word.  It's what I would call a "document instance"
>-- a particular chunk of information

Which it seems to me is something else again 
(since the same information can be conveyed by 
different concrete documents), but let's leave 
that aside.

>  -- if I were trying to
>disambiguate.  (Incidentally, I'm restricting my attention to
>information objects -- not physical, paper documents.) 
>
>But the word "document" is also often used to refer to an abstraction of
>information whose content may change over time
>-- what I would call an
>"abstract document" to disambiguate.  Whenever someone speaks of
>"modifying a document" they are necessarily talking about an abstract
>document, since it is not possible to modify a particular chunk of
>information.  (If you try, you simply get a different chunk of
>information.)

Please don't (mis)use 'abstract' in this way. 
That word has a very clear meaning, it is 
contrasted with 'concrete'. Nobody has ever 
edited or updated an abstract document, just as 
nobody has ever performed surgery on a number. 
One can't edit an abstraction. What you edit is 
some kind of concrete object in the actual 
physical world, a token of the abstraction.

>The "Previous version" link at the top of the WebArch document[10] is
>implicitly saying that there is an abstract document and the information
>content of the abstract document changed from the previous version to
>the current version.

But it seems to me that this W3C discipline 
conveys exactly the opposite message. The reason 
why the previous version must be given its own 
URI, and indeed why every version must have its 
own URI, is that each change is understood to 
produce a different resource, precisely because 
the result of the change is a new, different, 
document. If the previous version were the same 
document, it could be (an older state of) the 
same resource, and have the same URI.

Hmm, on reflection, I guess that the 'current 
version' is a living document, while the 
'previous version' and 'this version' are dead 
(fixed) documents.

>  > There is nothing in the XML
>>  spec that refers to documents changing over time.
>
>True.  Specs like that often use the word "document" to mean "document
>instance" rather than "abstract document".

Well, so does the rest of the world :-)  But I digress.

>  > Literary documents, legal documents and other
>>  documents do not change over time.
>
>Well, yes and no.  Document instances don't change; abstract documents
>do.  For example: "My lawyer and I just finished several iterations of
>editing and updating an important legal document".  That's referring to
>an abstract document.

OK, and here's a better example for you: "I just 
changed my will." The use of 'my will' in the 
singular makes it one of your sort of changing 
documents.

>  > In many cases,
>>  it is part of the very reason for having the
>>  document that it does not change over time. RDF
>>  graphs do not change over time. According to the
>>  TAG and REST, resources are defined to be able to
>>  change over time (more properly, to be functions
>>  from times to representations) but that does not
>>  imply that documents are resources: this is in
>>  fact one of the issues that we need to get clear.
>>  It seems that they cannot be, in fact, for this
>>  very reason: the only way to describe this
>>  situation coherently is to say that a resource
>>  can be a function from times to documents (the
>>  'version' at that time).
>
>Exactly.  (Assuming you mean document *instances*: ". . . a function
>from times to document *instances*".)  That's why I believe the TAG
>necessarily means "abstract document"

It does not mean abstract document, for sure. 
Maybe we need a better way to say this.

>when the WebArch says: "This
>document is an example of an information resource."[10]
>
>>  . . . Something is classified
>>  as a 'representation' simply by virtue of it not
>>  changing over time?
>
>"Not changing over time" is a necessary but not sufficient condition of
>something being a "representation", because a "representation" is a
>particular chunk of information, whereas an information resource is an
>abstraction.

You are using these words "representation", 
"information", "abstraction" in very 
idiosyncratic ways. Please don't do this without 
supplying a lot of explanation and background, 
preferably with examples, of what you mean by 
them. Whatever it is, its not what many of your 
readers will think you mean when they read them. 
I don't think a representation is a chunk of 
information (I don't know what a chunk of 
information is) and I know that representations 
do not have to be static, so not changing over 
time is not a necessary condition for being a 
representation. And by the way, it has been 
understood since Plato that one of the most 
salient aspects of abstractions is that they do 
not change over time, since abstractions by their 
very nature are not temporal. Notational 
conventions for numerals may change, but the 
numbers themselves cannot change. So to be told 
that something is abstract because it is liable 
to change reads like it is totally off the wall.

>
>>  This is the most
>>  extraordinary idea, and bears absolutely no
>>  relationship to the normal uses of this
>>  terminology. How can an abstract document be a
>>  representation of one of its own instances or
>>  tokens?
>
>It's the other way around: a "representation" (in the WebArch sense) is
>an instance (i.e., a snapshot) of an abstract document.

You cannot take a snapshot of an abstraction. 
Abstractions do not partake in any physical 
process.

>
>>  . . .
>>  >True, but if one is discussing the TAG's WebArch
>>  >document ( at http://www.w3.org/TR/webarch/ ),
>>  >it is essential to make this distinction,
>>  >because the difference between a
>>  >"representation" and an "information resource"
>>  >is essential to the WebArch.
>>
>>  I repeat, the WebArch is incomprehensible. The
>>  point, for me, of this entire discussion is to
>>  try to make sense of it. I know that the WebArch
>>  makes this distinction between "representation"
>>  and "information resource", but it never defines
>>  either of these terms, so I have no idea WHAT
>>  distinction this is supposed to actually BE. To
>>  be told that an incomprehensible distinction is
>>  'essential' is not very much help.
>
>Sorry!  I'm trying to make sense of them too!  :)
>
>>  >>  . . .
>>  >>  An RDF ontology, at any rate, is either an RDF
>>  >>  graph or an RDF/XML XML document. Either way, it
>>  >>  is not an HTTP endpoint or an abstraction of an
>>  >>  HTTP endpoint. So it cannot be an information
>>  >>  resource in David's sense, seems to me.
>>  >
>>  >Yes, it can be if instances of it are intended to be served via HTTP.
>>
>>  No, I am sorry, it cannot. The fact is that an
>>  HTTP endpoint, given your answer above to my
>>  question, is not even in the same category as an
>>  RDF ontology: it not the same KIND of thing. So
>>  if an information resource is an HTTP endpoint,
>>  then it cannot possibly be an RDF ontology. If
>>  you want an RDF ontology to be an information
>>  resource, then you must change your definition.
>>  This has got nothing to do with the transfer
>>  protocol.
>
>Again, it depends on what you mean by "RDF ontology".

I mean RDF graph as defined in the RDF specs.

>  If you are
>talking about an RDF ontology as a particular chunk of information then
>I agree it cannot be a logical HTTP endpoint.  But if someone is talking
>about an RDF ontology as an abstract thing

Well, an RDF graph is an abstraction, strictly speaking...

>  (which may have various
>versions at different times)

...but (in fact, therefore) it cannot have 
different versions at different times. The RDF 
specs talk about RDF graphs which are an abstract 
syntax, and RDF/XML which is an interchange 
syntax. XML, in turn, talks about documents as 
things that can be sent over a network and are 
encoded in character streams, and have a syntax. 
None of this mentions things changing with time, 
and all of it seems to be incompatible with any 
such reading.

>, then you can think of that as a logical
>HTTP endpoint

No, really, you cannot. That is a glaring 
category error. It is logically impossible to put 
an abstraction onto a communication network, or 
for an abstraction to respond to messages with 
other messages. This is like saying that a prime 
number might have a mass or a personality. I 
don't think you really should be saying 'abstract 
thing' here.

>: it is a function from times to instances a/k/a
>"representations".

Well, let me ignore the 'abstract' issue, and 
focus on intended meanings. So, whatever we call 
this thing, it is an (implementation of) a 
function from times to (let us say) RDF/XML 
character streams. That is a coherent notion, I 
agree. But what I am now find totally puzzling is 
what it means to say that these streams, that 
this thing outputs from time to time, are 
"representations" of it. What kind of 
representation of it are they? They do not 
describe it, in most cases. They do not depict 
it, in most cases. They are sometimes called 
'snapshots', but it is not clear in what sense 
there is anything about an http endpoint for them 
be a 'snapshot' of. And what are they 
representations of? It seems kind of obvious that 
they do not represent the *function*; one does 
not normally say that the value of a function for 
one argument is a representation of the function: 
that would make every number a representation of 
every arithmetic function. They certainly cannot 
be snapshots of a function. So what is it that 
they do represent? (The state of the function?? 
But functions don't have states.) So maybe we 
shouldn't say that the resource here *is* a 
function, but rather that it is some 
computational entity - an endpoint? - which emits 
character streams from time to time, and they are 
representations of the state of this entity. 
Well, we could say this, I guess, but there seems 
to be no utility in introducing this entity into 
the discussion. It is of interest only insofar as 
it spits out character streams and HTTP codes 
when suitably prodded. That is all we really need 
to say about it, in fact: The streams themselves 
might describe or denote or indicate all kinds of 
things which have nothing to do with this entity 
itself (in fact, the cases where they do have any 
important representational relationship to the 
entity are so rare that it hard to think of any), 
and we can describe this entire process without 
even mentioning this entity other than as an 
emitter of outputs, i.e. as a function; which 
would of course be the way it would be described 
in any kind of formal theory of computation. And 
to repeat, the outputs of an information resource 
certainly do not represent the function that it 
is defined to be in the REST model.

Pat

>
>[1] DBooth proposed definition of "information resource":
>http://lists.w3.org/Archives/Public/public-swbp-wg/2006Apr/0053.html
>
>[10] WebArch definition of "information resource":
>http://www.w3.org/TR/2004/REC-webarch-20041215/#def-information-resource
>
>David Booth


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 1 May 2006 18:09:08 UTC