RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Pat Hayes on 2006-04-21 (public-swbp-wg@w3.org from April 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 21 Apr 2006 13:00:54 -0500
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: <public-swbp-wg@w3.org>, "Guus Schreiber" <guus@few.vu.nl>, "Steve Pepper" <pepper@ontopia.net>, "Mark van Assem" <mark@cs.vu.nl>, "Ralph R. Swick" <swick@w3.org>
Message-Id: <p06230901c06ea5227cc0@[192.168.2.3]>
>Pat,
>
>It sounds like you are mainly disagreeing with the TAG's guidance.

Indeed.

>  I
>think it's important that the working group conform to the TAG's
>guidance, even if that guidance isn't entirely baked.

Well, I disagree, when the guidance is this un-baked. I think that 
conforming at this point is like obeying clearly insane orders, and 
that we have a duty to disobey. But I understand that others may not 
agree with this attitude. (I wonder, why do I think of the Charge of 
the Light Brigade, at this point?)

>  However, I
>applaud your efforts to help straighten out our thinking around these
>issues.  More below.
>
>>  From:  Pat Hayes
>>
>>  It might be best to start with a definition of what you consider an
>>  information resource to be. Since the TAG do not define this critical
>>  term, yet base important engineering decisions on it, any
>>  authoritative exposition would be of immense value. My current
>>  understanding is that an information resource is some thing that can
>>  be transmitted over a network by a transfer protocol. On this
>>  understanding, one could argue that a word was an information
>>  resource.
>
>Definitely not.  That would be a "representation", not an "information
>resource".  The information resource is the *source* of
>"representations" that can be transmitted over a network.

Ah, I see. Thanks for that clarification. So for example an RDF 
ontology and an HTML web page are not information resources, either, 
I take it.

That is interesting, as this is the most restrictive notion of 
'resource' that I have yet encountered. On this understanding, 
'resources' are an *extremely* small subset of the things that make 
up the observable universe (not to mention the universe of all 
things, real or imaginary, which can be referred to by a name.) This 
is in remarkable contrast to the idea that a resource is 'anything 
with an identity' or the notion, incorporated into the RDF and OWL 
semantics at the behest of W3C members who apparently should have 
known better, that a 'resource' is anything that can be said to be in 
a logical universe.  The contrast is so extreme that the division 
into 'information' and 'non-information' seems almost ludicrous, like 
starting a global taxonomy with a division between 'very small red 
insects' and 'all other things'.

I will try to use your notion in the rest of this message. BTW, with 
this notion of information resource, I don't think that information 
resources play ANY role at all in the RDF/RDFS/OWL specs. The 
denotation of the URI in the body of an owl:imports is not an 
information resource in this sense.

>I have also been struggling with trying to guess what the TAG meant an
>"information resource" to be, or more notably, what the TAG meant it to
>exclude.  FWIW, here is my proposed working definition of the day:
>
>	An information resource is all and only a logical HTTP
>	endpoint that is intended to serve representations with
>	a 2xx response code.

I see where you are coming from, but doesn't that last clause smack 
of both circularity and vacuity?  It seems to me that the only reason 
for mentioning response codes in such a basic definition would be to 
try to make some kind of after-the-fact sense of this irrational TAG 
decision. Nobody, and I really mean nobody, would have even thought 
of doing this (i.e. mentioning response codes in an ontological 
classification) before the TAG started us along this road. And one 
could argue that *all* internet nodes with a URI are *intended* to 
serve 2xx response codes, since the other codes are classified as 
errors. Finally, how does it help? Here I am with my URI and I ping 
it, and I get a 303. Ah, but was that 303 *intentional*? Maybe it was 
just, well, you know, a common or garden re-direct, because they 
moved the server. I have no way to know. Maybe we need a new addition 
to the HTTP protocols called an intentionality bit.

But there is a broader issue. Suppose I coin a URI intended to refer 
to something entirely not to do with the Internet, such as Dan's car:

http://www.ihmc.us/users/phayes/examples/Dans_Car

Now, what are the resources here? Following your guidance, the 
information resource is the http endpoint that is intended to serve 
representations with a 2xx response code. Well, as I understand http 
(admittedly Im not a maven), the above would involve sending

"users/phayes/examples/Dans_Car"

to

www.ihmc.us

which is what sends back the codes. Right now, for example, it sends 
back a 404 and a note about a 505. So, if I were to obey the TAG 
decision then www.ihmc.us would send back a 303 rather than a 404 in 
this case. So, does that make www.ihmc.us into a non-information 
resource? That seems wrong, since it will send back 2xx codes in many 
cases, being a respectable web browser. OK, so lets assume that 
www.ihmc.us is in fact an information resource, but some other 
'endpoint' inside it somewhere, a piece of Javascript in the folder 
/users/phayes/examples on the server, perhaps, is a different 
endpoint whose sole purpose is the emit a 303 , and therefore is not 
an information resource. Well, OK, but whatever that thing is, its 
certainly not Dan's car, which is what this URI was created to denote 
in the first place, right? Now we have three things to consider: an 
information resource, a car, and some weird thing in my computer 
which serves no discernable utility but isn't an information 
resource. In fact, it has absolutely nothing whatever to do with any 
car, Dan's or otherwise. And in fact, it surely *seems* like an 
information resource in the usual sense: its an endpoint on an HTTP 
network, it emits codes in response to http requests, its a 
computational-architectural-network-transfer-protocol kind of thing, 
in marked contrast to what most of the universe consists of, which 
has absolutely nothing whatever to do with networks, protocols, HTTP 
or anything remotely within the purview of network software 
engineering.

Remember, I want that URI to denote a car. How can ANYTHING involving 
transfer protocol codes have ANY relevance to whether or not a name 
refers to a car? This just does not make any sense.

>Note that:
>
>	- If something is never intended to return a 2xx response
>	code then it is not an information resource.
>
>	- It is "logical", not physical.
>
>	- It is "all and only" because it does NOT include anything
>	else that might be attached to that information resource.
>
>By this definition, a resource that is an "information resource" cannot
>also be any other kind of resource.  This means, for example, that an
>information resource cannot also be a person or Dan's car.

And, I presume, Dan's car can be a resource,  but not an information 
resource. It is the former because everything is a resource, but its 
not an information resource because... because it isn't intended to 
return a 2xx response code? Really?? Is that the kind of question one 
might ask a user-car salesman? In any case, how do you know? I have 
absolutely no idea what kind of response code one might get from, 
say, the great nebula in Orion.

>  However, it
>could be a part of Dan's car, or it could be associated with a person.
>
>>  . . .
>>  >To be specific, [8] tells us that the URIs we choose for each of the
>>  >WordNet synsets, word senses, and words MUST be served with
>>  >a 303 See Other response.
>>
>>  [8] does not use the word MUST, and again, I suggest that it would be
>>  a serious, indeed disastrous, error, to interpret it this strongly.
>
>I think you're quibbling here.  The difference between RFC2119 terms
>"SHOULD" and "MUST" is that "SHOULD" permits exceptions to a general
>rule in particular circumstances, whereas "MUST" does not.  But you seem
>to be arguing against the general rule -- not for an exception.

I saying that 'advice' is not a prohibition. I don't see anything in 
the idea of the TAG giving 'advice' which requires the use of ANY 
language from RFC 2119.

>
>>  The 303-indirect mechanism suggested by [8] is ill-thought-out (it is
>>  based, erroneously, on a distinction between types of resource,
>  > rather than on the a distinction between types of relationship
>>  between names and resources),
>
>I would very much like to understand better what you mean by "types of
>relationship between names and resources".  Can you explain further?

Sure. I have been trying to explain this to the TAG since I first 
communicated with them. Consider two ideas, both describing 
relationships between a name/identifier/URI and a thing.

The first is simply the relationship of 'being a name of', a.k.a 
denoting or referring to. This is the idea which is given 
mathematical flesh in logical (Taskian) model-theoretic semantics and 
which is discussed at length in linguistics, semiotics and 
philosophical logic. It can be subdivided into various special cases, 
analyzed into philosophical oblivion, tinkered with in various ways, 
etc.., but remains basically the same idea. It underlies the use of 
names in natural language to refer to things. It is a very general 
(and therefore weak) notion, applying to any kind of naming and any 
kind of thing named.

The second arises only in computer science and IT more generally. 
This is the idea  of an identifier being somehow used to locate or 
identify a piece of information, or maybe a computational entity 
which can emit information. The paradigmatic ur-form of this 
relationship is a RAM address, or maybe a number on a Turing machine 
tape being used to identify a tape location; but it has grown more 
elaborate and become more sophisticated over the years, particularly 
with reference to the Internet. But again, it retains the basic core 
of meaning: the relationship between an identifier and the 
essentially computational entity that it identifies is mediated by an 
intervening architecture connecting the point of use with the point 
identified, so as to support some transfer of information between 
them. It has essentially to do with information transfer, and indeed 
with *computationally supported* information transfer. Without an 
assumed architecture to support the transfer process, this notion is 
vacuous or meaningless.

OK, there are these two ideas. Now, my main point is that they are 
DIFFERENT. They are in fact so different that they have almost 
nothing to do with one another, and can (and IMO should) be 
considered independently. (I could list the reasons why they are 
different at great length, and once did so in a submission to the 
TAG.) The TAG is talking about, and using ideas relevant to, the 
second, but mistakenly applying these ideas to the first. 
RDF/RDFS/OWL use URIs as names, not as identifiers. HTTP uses URIs as 
identifiers, not as names. Only confusion can result from trying to 
assimilate either of these two uses to the other. Using a single URI 
to be both an identifier and as a name is not 'ambiguity', as the TAG 
apparently decided it must be, and is not harmful or dangerous to 
interoperation, as the TAG seems to have erroneously concluded it is. 
For example: that identifiers (second sense) should not be ambiguous 
in what they identify, is a reasonable architectural principle which 
the TAG has consistently maintained and reiterated. Which is fine; 
but to conclude that all names (first sense) must or should *refer* 
unambiguously, is not only wrong, it is provably wrong. One can 
prove, quite rigorously, that names in any reasonably expressive 
language *cannot* be unambiguous in this sense. This is basically 
Goedel's famous theorem. But more to the point, there is absolutely 
nothing in any network-architectural or linguistic-semantic theory 
which suggests that they should be, that anything is harmed by their 
having multiple interpretations when used to refer, or that their 
referents should be related to whatever it is that they identify. So 
by making this unfortunate confusion between identifying and naming, 
the TAG's laudable defense of a cherished architectural principle has 
degenerated into an off-the-wall idea which attempts to shore up a 
completely indefensible position in semantic theory.

>  > critically underspecified (no
>>  definition is given of "information resource") pointless (it does not
>>  in fact do any disambiguation) and potentially harmful (it imposes a
>  > needless implementation burden on semantic applications, to
>  > absolutely no useful purpose), and should not be followed by any
>>  responsible semantic web practitioner. [8] is a BAD DECISION,
>>  possibly the worst bad decision ever made by a standards body since
>>  the 8-track tape. It is based on a failure to grasp the basic issues,
>>  it achieves nothing, and it will seriously hamper the development of
>>  the semantic web. The only responsible attitude to take to this
>>  decision is to ignore it.
>
>Even if the httpRange-14 decision is wrong-headed (and personally I am
>not yet convinced either way, though I *am* convinced that it is
>problematic as is, and I have very much appreciated reading your
>perspective) I don't see a huge harm in causing some extra network
>accesses while we further straighten out these issues.

If only it were the case that this could be treated as an interim 
suggestion while we straighten out these issues, I might agree with 
you. But that is not how this is widely seen: it is seen as the TAG 
finally getting a tar-pit nicely frozen and paved over. Nobody is 
going to want to re-melt this. The harm arises from the fact that 
once this damn silly idea is incorporated into a number of specs 
which are treated as authoritative (which is already happening), it 
will acquire the inertia of Received Practice and then become 
virtually immovable. And what gets me particularly incensed is that 
there really is no actual problem here at all: there is no need for 
this solution to a non-problem. Nobody except the TAG has felt any 
need to have the TAG 'resolve' this 'issue'. I feel like Im in a 
church where the High Priests have finally decided how many angels 
dance on the head of a pin, which would be fine except that they now 
are telling us that when we use pins we have shake them three times 
first to get the angels off, which really interferes with the 
needlework.

>  (And the extra
>network accesses can even be optimized away if URIs are minted with a
>303-forwarding prefix such as "http://thing-described-by.org?" , as
>described in
>http://lists.w3.org/Archives/Public/public-swbp-wg/2005Aug/0057 .)

Yes, true, I could. But there is already a far larger body of 
received practice which simply treats URIs as constructed from a 
local server address. I maintain a  number of URIs, and - nothing 
personal, but - I am damned if I want to give them all away to any 
303-redirecting service, no matter how benign its author's 
intentions. Any why should I, in any case? I feel like I would be 
cooperating with lunacy, and I see no reason to do that. I am 
confident that nothing will break if I don't cooperate, but continue 
to behave rationally. Until someone explains the reason why I should 
do this in a more coherent way than I have read so far, I will 
continue to use URIs and HTTP codes rationally, in conformance with 
their actual specs (which in the case of HTTP, do not mention 
'information resources'.)

Pat

>
>
>>  . . .
>>  >    [8] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
>>  . . . .
>
>David Booth


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 21 April 2006 18:01:02 UTC