Re: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Pat Hayes on 2006-04-20 (public-swbp-wg@w3.org from April 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 20 Apr 2006 12:37:39 -0500
To: "Ralph R. Swick" <swick@w3.org>
Cc: <public-swbp-wg@w3.org>, "Guus Schreiber" <guus@few.vu.nl>, Steve Pepper <pepper@ontopia.net>, Mark van Assem <mark@cs.vu.nl>
Message-Id: <p06230916c06c2f5ea856@[10.100.0.24]>
>Thanks, Steve, for giving me the perfect opening for a thread
>on distinguishing "documents" from "words" -- i.e. "information
>resources" from terms in WordNet.

It might be best to start with a definition of what you consider an 
information resource to be. Since the TAG do not define this critical 
term, yet base important engineering decisions on it, any 
authoritative exposition would be of immense value. My current 
understanding is that an information resource is some thing that can 
be transmitted over a network by a transfer protocol. On this 
understanding, one could argue that a word was an information 
resource.

>This is really what Best Practice Recipes for Publishing RDF
>Vocabularies [5] is all about

As far as I can see, that document is simply an exposition of how to 
implement the idea proposed in [8]. It provides no extra explanation 
of what this idea (of a 303-redirect being mandatory) is based on, 
what purpose it achieves, or why anyone would want to do such a damn 
silly thing, other than from a general desire to Obey the Commands of 
the Tag.

>, so it's good to consider it in
>light of the specific example of WordNet.
>
>    [5] http://www.w3.org/TR/swbp-vocab-pub/
>
>At 08:04 PM 4/18/2006 +0200, Steve Pepper wrote:
>...
>>  [1] http://wordnet.princeton.edu/wn20/synset-bank-noun-1
>>  [2] http://wordnet.princeton.edu/wn20/wordsense-bank-noun-1
>>  [3] http://wordnet.princeton.edu/wn20/word-bank
>>  [4] http://wordnet.princeton.edu/wn20/schema-participleOf
>
>...
>
>>I'm interested to know what these URLs will resolve to. I
>>would like to see them resolve to the *human-readable* content
>>of WordNet
>
>I agree -- but I want *both* human-readable *and* machine-
>interpretable content to be served in response to requests
>for those URIs.  [5] tells us how to do this in a way that is
>consistent with our best current understanding of Web Architecture.

Nonsense. [5] tells us nothing about this at all. How does a 
303-redirect allow content to be served in more than one way?

>
>>Why human-readable content and not a CBD [1]?
>
>I'll rephrase that as "why human-readable content for
>humans and CBD for machines?" :)
>
>The WordNet database [6] provides a system in which
>
>    "English nouns, verbs, adjectives and adverbs are organized into
>     synonym sets, each representing one underlying lexical concept."
>    -- [6]
>
>These items -- the nouns, verbs, adjectives, and adverbs --
>are the resources we want to describe.  The names we
>give to them for purposes of describing them are important
>but somewhat arbitrary (see previous threads, most recent
>at [7]).  The names could, as you wrote, just all be numeric.
>
>The important distinction that Web Architecture makes [8]
>is that the items hereby named in our WordNet vocabulary
>are *not* themselves what the Web now calls "information
>resources" [9].

I really think that you are making a serious mistake here. These ARE 
information resources, because they can be stored in digital form and 
transmitted over a network. The meanings of these words might not be 
information resources, but the words themselves are, because words 
are representations. Words, of  course, represent their meanings. 
That is what they are FOR.

>What we want to accomplish -- in particular by choosing names
>that begin with "http:" -- is to leverage the deployed Web to provide
>us exactly what you are asking for: human-readable content
>that (we hope) describes these items *as well as* other content
>types that are optimized for non-humans.  There has been a long
>and arduous discussion (see [10]) on how to use http: URIs
>to accomplish this.  The result of that discussion has informed
>[5] and shows us now a way to get the Web to resolve a name
>for a WordNet item to either human-readable content or
>machine-interpretable content according to preferences set
>by the client issuing the HTTP GET.
>
>To be specific, [8] tells us that the URIs we choose for each of
>the WordNet synsets, word senses, and words MUST be served
>with a 303 See Other response.

[8] does not use the word MUST, and again, I suggest that it would be 
a serious, indeed disastrous, error, to interpret it this strongly. 
The 303-indirect mechanism suggested by [8] is ill-thought-out (it is 
based, erroneously, on a distinction between types of resource, 
rather than on the a distinction between types of relationship 
between names and resources), critically underspecified (no 
definition is given of "information resource") pointless (it does not 
in fact do any disambiguation) and potentially harmful (it imposes a 
needless implementation burden on semantic applications, to 
absolutely no useful purpose), and should not be followed by any 
responsible semantic web practitioner. [8] is a BAD DECISION, 
possibly the worst bad decision ever made by a standards body since 
the 8-track tape. It is based on a failure to grasp the basic issues, 
it achieves nothing, and it will seriously hamper the development of 
the semantic web. The only responsible attitude to take to this 
decision is to ignore it.

>  The server implementor then
>gets to choose *different* URIs to name the content that will
>describe the WordNet item.  The choice of these other URIs,
>which *do* name "information resources" is somewhat
>arbitrary -- and does not, I think, need to be specified in our
>Working Draft.
>
>    [6] http://wordnet.princeton.edu/
>    [7] 
><http://lists.w3.org/Archives/Public/public-swbp-wg/2006Apr/thread.html#msg40>
>    [8] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
>    [9] http://lists.w3.org/Archives/Public/www-tag/2003Jul/0377.html
>    [10] http://www.w3.org/2001/tag/issues.html#httpRange-14
>
>
>So, for example, an HTTP GET on
>
>http://wordnet.princeton.edu/wn20/word/bank
>
>MUST NOT

[8] does not use this strong language. It refers to 'advice' being 
provided to the community. It is a serious misrepresentation to 
describe 'advice' using keywords from RFC 2119. I quote:

"MUST NOT   This phrase, or the phrase "SHALL NOT", mean that the 
definition is an absolute prohibition of the specification."

"Imperatives of the type defined in this memo must be used with care 
and sparingly.  In particular, they MUST only be used where it is 
actually required for interoperation or to limit behavior which has 
potential for causing harm (e.g., limiting retransmisssions)  For 
example, they must not be used to try to impose a particular method 
on implementors where the method is not required for 
interoperability."

Mandatory 303 redirects of URIs used to denote real things, as 
advised in [8], is NOT required for interoperability. In fact it is 
not required for anything. It causes harm in exactly the way cited by 
RFC 2119, by causing needless retransmissions.

>return 200 OK but rather something like
>
>303 See Other
>Location: http://www.w3.org/2006/03/wn/wn20/word/bank
>
>The client receiving 303 See Other is free to ask again for
>the "redirected" URI and there the response can (SHOULD)
>be 200 OK.

OK, let us examine this. What has been achieved by this 303-shuffle 
being performed? Look at it from any perspective you like: 
human-oriented or machine-oriented. From the human point of view, all 
that matters is what the person eventually gets to see on their 
browser screen. The route to it is not only irrelevant, it is usually 
invisible. So the fact that a 303 redirect was done is about as 
relevant as the fact the mailman danced a gavotte while delivering 
your letter. Unless you were looking out of the window at the time, 
you wouldn't even know about it. [8] talks about 'information' being 
'supplied via the 303 redirect', but this is nonsense. Information 
being supplied to whom, about what? This is not 'disambiguation'. 
Nothing in any actual TEXT is rendered any more or less ambiguous by 
a 303 redirect. Even if some devious way were proposed to record the 
indirection, it could not possibly be used to distinguish between 
kinds of resource, since information resources can also issue 303 
redirects. (This whole discussion is surreal, however, since the very 
idea of using a transfer protocol to achieve a semantic 
disambiguation is brain-damaged, a multiple category error.) So, 
perhaps the point is that the redirection helps machine processing of 
URIs. But how, exactly? Any use of URIs to *refer* - as in RDF, RDFS 
and OWL - does not even involve the http protocols. The only purpose 
of using URIs in these languages (with a few exceptions, eg 
owl:imports, where this issue does not arise since the URIs involved 
do refer to information resources) is to act as logical names with a 
globally unique scope. Inferences do not invoke transfer protocols: 
and conversely, what happens when a transfer protocol is invoked is 
completely invisible to inference and reasoners; at most, what is 
visible can only be the result of that transfer, as in rdfs:seeAlso 
and owl:imports. The entire Internet could go down, and RDF, RDFS and 
OWL inferences using URIs would not be changed one whit. So for 
machine processing (at least, processing of the kind sanctioned and 
assumed by the W3C specs defining RDF, RDFS, OWL and SPARQL), the 
mechanism proposed in [8] is also completely irrelevant.

>This actually fits quite transparently into current deployed
>Web infrastructure; most (all?) browsers currently treat a
>303 response as a redirection and proceed to issue another
>request for the "redirected" resource, displaying the final result.

This is like saying that to walk on ones hands is normal because 
gloves are made of leather. The point is, WHY did the browser have to 
be redirected AT ALL? What is achieved by requiring this shuffle to 
be performed? Suppose that we simply ignore [8], do no indirection, 
and have the original URI deliver the final response, being sensitive 
if you like to the preferences of the GET. What would break? Nothing.

>
>Note that now that we have this additional level of indirection,
>we are free to respond with
>
>Location: http://www.w3.org/2006/03/wn/wn20/word/bank.html
>or
>Location: http://www.w3.org/2006/03/wn/wn20/word/bank.rdf
>
>at our option.  And we inform our choice of response based on
>what the client has put in its Accept: header on the original
>HTTP GET.  Once again, most deployed browsers will indicate
>that users prefer human-readable forms and so the "right thing"
>can be made to happen for a human clicking around in a
>"Web page" browser.
>
>e.g.:
>
>Case 1: client prefers (human-readable) HTML
>
>->
>GET /wn20/word/bank HTTP/1.1
>Host: wordnet.princeton.edu
>Accept: text/html, text/xml
>
><-
>303 See Other
>Location: http://www.w3.org/2006/03/wn/wn20/word/bank.html
>
>->
>GET /2006/03/wn/wn20/word/bank.html HTTP/1.1
>Host: www.w3.org
>Accept: */*
>
><-
>200 OK
>Vary: negotiate,accept
>Content-Type: text/html; charset=utf-8
>
>versus
>
>Case 2: client prefers (machine-interpretable) RDF/XML:
>
>->
>GET /wn20/word/bank HTTP/1.1
>Host: wordnet.princeton.edu
>Accept: application/rdf+xml
>
><-
>303 See Other
>Location: http://www.w3.org/2006/03/wn/wn20/word/bank.rdf
>
>->
>GET /2006/03/wn/wn20/word/bank.rdf HTTP/1.1
>Host: www.w3.org
>Accept: */*
>
><-
>200 OK
>Content-Type: application/rdf+xml

So, again, what is achieved by the indirection, in this example? BOTH 
kinds of query are redirected, and the selection is done by the 
Accept: line, and performed by the host at www.w3.org. Why could this 
not have been achieved by the original host at wordnet.princeton.edu, 
without the redirection? It could. The 303 achieves nothing except to 
waste transmission time.

In any case, this issue raised here, involving the use of Accept: to 
control the GET, is not what [8] refers to. It is claimed in [8] that 
this mechanism is supposed to be used to distinguish between 
information resources and other, non-information resources: but your 
example makes no such distinction. RDF and HTML are both information 
resources. Anything that you can transmit over a network is an 
information resource. No such disambiguation of resource type is done 
in your example.

>
>Now, the 3 April editors' draft [11] suggests that Case 2 can be
>implemented with a SPARQL query.  That's plausibly a fine thing
>to do but it is entirely at the server's discretion *how* to implement
>a response to the request for (an RDF representation of) information
>about one of our published WordNet item URIs.
>
>    [11] <http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20060403>
>
>Note, too, that it is just fine for a GET on the namespace URI, e.g.
>for synsets, to return a document that describes all the synsets in
>the 2.0 version of WordNet:
>
>->
>GET /wn20/synset/ HTTP/1.1
>Host: wordnet.princeton.edu
>Accept: application/rdf+xml
>
><-
>200 OK
>Content-Location: /wn20/synset/index.rdf
>Accept: application/rdf+xml
>
><rdf:RDF xmlns:rdf="..." xmlns:wn20="...">
>  <wn20:Synset
>    rdf:about="http://wordnet.princeton.edu/wn20/synset/bank-noun-1">
>    <wn20:synsetContainsWordSense
>      rdf:resource="http://wordnet.princeton.edu/wn20/word/bank-noun-1"/>
>    ...
>  </wn20:Synset>
>  ...
></rdf:RDF>
>
>In this case no 303 redirect is needed because it is acceptable to
>say that one representation of a namespace *is* an information
>resource (i.e. a document).

Why is it not acceptable to say that ANY representation is an 
information resource? And why is a word not a representation? Or, if 
you want to be a little more careful, why is a token of a word - 
which could actually be a document - not a representation of the 
Platonic word itself? Why, for that matter, is a URI with a word 
embedded into it in some systematic way - which also could be a 
document, as well as identify a document - not a representation of 
the word?

>
>This is how I suggest that we implement "WordNet Basic" -- no
>need for publishing additional URIs; we just use an obvious
>URI that already makes some "sense" in our vocabulary structure.
>
>We can name lots of documents that return information about
>our WordNet items; e.g. we could support a "query" for all the
>known word senses of "bank" used as a noun by supporting
>another set of URI patterns that are similar to the names of
>the word senses themselves:
>
>->
>GET /wn20/word/bank-sense-n
>Host: wordnet.princeton.edu
>Accept: text/html
>
><-
>200 OK
>
><?xml version="1.0" encoding="utf-8"?>
><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
><body>
><h1>About noun senses of "Bank"</h1>
>...
>
>In this case, the URI http://wordnet.princeton.edu/wn20/word/bank-sense-n
>is not naming a WordNet item but rather is naming a document
>that describes a WordNet item.
>
>Whether to support such "convenience" URIs (queries) rather
>than an explicit SPARQL service is largely up to the service
>provider to decide.  But it is important that our document be
>clear that any such convenience URIs are naming documents
>and not items in the WordNet vocabulary.

WHY is this important? And more centrally, do you really mean NAMING 
here, or do you mean IDENTIFYING in the sense used by the TAG? These 
are not the same notion.

Pat

>
>-Ralph


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 20 April 2006 17:37:55 UTC