- From: Pat Hayes <phayes@ihmc.us>
- Date: Thu, 20 Apr 2006 12:37:39 -0500
- To: "Ralph R. Swick" <swick@w3.org>
- Cc: <public-swbp-wg@w3.org>, "Guus Schreiber" <guus@few.vu.nl>, Steve Pepper <pepper@ontopia.net>, Mark van Assem <mark@cs.vu.nl>
>Thanks, Steve, for giving me the perfect opening for a thread >on distinguishing "documents" from "words" -- i.e. "information >resources" from terms in WordNet. It might be best to start with a definition of what you consider an information resource to be. Since the TAG do not define this critical term, yet base important engineering decisions on it, any authoritative exposition would be of immense value. My current understanding is that an information resource is some thing that can be transmitted over a network by a transfer protocol. On this understanding, one could argue that a word was an information resource. >This is really what Best Practice Recipes for Publishing RDF >Vocabularies [5] is all about As far as I can see, that document is simply an exposition of how to implement the idea proposed in [8]. It provides no extra explanation of what this idea (of a 303-redirect being mandatory) is based on, what purpose it achieves, or why anyone would want to do such a damn silly thing, other than from a general desire to Obey the Commands of the Tag. >, so it's good to consider it in >light of the specific example of WordNet. > > [5] http://www.w3.org/TR/swbp-vocab-pub/ > >At 08:04 PM 4/18/2006 +0200, Steve Pepper wrote: >... >> [1] http://wordnet.princeton.edu/wn20/synset-bank-noun-1 >> [2] http://wordnet.princeton.edu/wn20/wordsense-bank-noun-1 >> [3] http://wordnet.princeton.edu/wn20/word-bank >> [4] http://wordnet.princeton.edu/wn20/schema-participleOf > >... > >>I'm interested to know what these URLs will resolve to. I >>would like to see them resolve to the *human-readable* content >>of WordNet > >I agree -- but I want *both* human-readable *and* machine- >interpretable content to be served in response to requests >for those URIs. [5] tells us how to do this in a way that is >consistent with our best current understanding of Web Architecture. Nonsense. [5] tells us nothing about this at all. How does a 303-redirect allow content to be served in more than one way? > >>Why human-readable content and not a CBD [1]? > >I'll rephrase that as "why human-readable content for >humans and CBD for machines?" :) > >The WordNet database [6] provides a system in which > > "English nouns, verbs, adjectives and adverbs are organized into > synonym sets, each representing one underlying lexical concept." > -- [6] > >These items -- the nouns, verbs, adjectives, and adverbs -- >are the resources we want to describe. The names we >give to them for purposes of describing them are important >but somewhat arbitrary (see previous threads, most recent >at [7]). The names could, as you wrote, just all be numeric. > >The important distinction that Web Architecture makes [8] >is that the items hereby named in our WordNet vocabulary >are *not* themselves what the Web now calls "information >resources" [9]. I really think that you are making a serious mistake here. These ARE information resources, because they can be stored in digital form and transmitted over a network. The meanings of these words might not be information resources, but the words themselves are, because words are representations. Words, of course, represent their meanings. That is what they are FOR. >What we want to accomplish -- in particular by choosing names >that begin with "http:" -- is to leverage the deployed Web to provide >us exactly what you are asking for: human-readable content >that (we hope) describes these items *as well as* other content >types that are optimized for non-humans. There has been a long >and arduous discussion (see [10]) on how to use http: URIs >to accomplish this. The result of that discussion has informed >[5] and shows us now a way to get the Web to resolve a name >for a WordNet item to either human-readable content or >machine-interpretable content according to preferences set >by the client issuing the HTTP GET. > >To be specific, [8] tells us that the URIs we choose for each of >the WordNet synsets, word senses, and words MUST be served >with a 303 See Other response. [8] does not use the word MUST, and again, I suggest that it would be a serious, indeed disastrous, error, to interpret it this strongly. The 303-indirect mechanism suggested by [8] is ill-thought-out (it is based, erroneously, on a distinction between types of resource, rather than on the a distinction between types of relationship between names and resources), critically underspecified (no definition is given of "information resource") pointless (it does not in fact do any disambiguation) and potentially harmful (it imposes a needless implementation burden on semantic applications, to absolutely no useful purpose), and should not be followed by any responsible semantic web practitioner. [8] is a BAD DECISION, possibly the worst bad decision ever made by a standards body since the 8-track tape. It is based on a failure to grasp the basic issues, it achieves nothing, and it will seriously hamper the development of the semantic web. The only responsible attitude to take to this decision is to ignore it. > The server implementor then >gets to choose *different* URIs to name the content that will >describe the WordNet item. The choice of these other URIs, >which *do* name "information resources" is somewhat >arbitrary -- and does not, I think, need to be specified in our >Working Draft. > > [6] http://wordnet.princeton.edu/ > [7] ><http://lists.w3.org/Archives/Public/public-swbp-wg/2006Apr/thread.html#msg40> > [8] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html > [9] http://lists.w3.org/Archives/Public/www-tag/2003Jul/0377.html > [10] http://www.w3.org/2001/tag/issues.html#httpRange-14 > > >So, for example, an HTTP GET on > >http://wordnet.princeton.edu/wn20/word/bank > >MUST NOT [8] does not use this strong language. It refers to 'advice' being provided to the community. It is a serious misrepresentation to describe 'advice' using keywords from RFC 2119. I quote: "MUST NOT This phrase, or the phrase "SHALL NOT", mean that the definition is an absolute prohibition of the specification." "Imperatives of the type defined in this memo must be used with care and sparingly. In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm (e.g., limiting retransmisssions) For example, they must not be used to try to impose a particular method on implementors where the method is not required for interoperability." Mandatory 303 redirects of URIs used to denote real things, as advised in [8], is NOT required for interoperability. In fact it is not required for anything. It causes harm in exactly the way cited by RFC 2119, by causing needless retransmissions. >return 200 OK but rather something like > >303 See Other >Location: http://www.w3.org/2006/03/wn/wn20/word/bank > >The client receiving 303 See Other is free to ask again for >the "redirected" URI and there the response can (SHOULD) >be 200 OK. OK, let us examine this. What has been achieved by this 303-shuffle being performed? Look at it from any perspective you like: human-oriented or machine-oriented. From the human point of view, all that matters is what the person eventually gets to see on their browser screen. The route to it is not only irrelevant, it is usually invisible. So the fact that a 303 redirect was done is about as relevant as the fact the mailman danced a gavotte while delivering your letter. Unless you were looking out of the window at the time, you wouldn't even know about it. [8] talks about 'information' being 'supplied via the 303 redirect', but this is nonsense. Information being supplied to whom, about what? This is not 'disambiguation'. Nothing in any actual TEXT is rendered any more or less ambiguous by a 303 redirect. Even if some devious way were proposed to record the indirection, it could not possibly be used to distinguish between kinds of resource, since information resources can also issue 303 redirects. (This whole discussion is surreal, however, since the very idea of using a transfer protocol to achieve a semantic disambiguation is brain-damaged, a multiple category error.) So, perhaps the point is that the redirection helps machine processing of URIs. But how, exactly? Any use of URIs to *refer* - as in RDF, RDFS and OWL - does not even involve the http protocols. The only purpose of using URIs in these languages (with a few exceptions, eg owl:imports, where this issue does not arise since the URIs involved do refer to information resources) is to act as logical names with a globally unique scope. Inferences do not invoke transfer protocols: and conversely, what happens when a transfer protocol is invoked is completely invisible to inference and reasoners; at most, what is visible can only be the result of that transfer, as in rdfs:seeAlso and owl:imports. The entire Internet could go down, and RDF, RDFS and OWL inferences using URIs would not be changed one whit. So for machine processing (at least, processing of the kind sanctioned and assumed by the W3C specs defining RDF, RDFS, OWL and SPARQL), the mechanism proposed in [8] is also completely irrelevant. >This actually fits quite transparently into current deployed >Web infrastructure; most (all?) browsers currently treat a >303 response as a redirection and proceed to issue another >request for the "redirected" resource, displaying the final result. This is like saying that to walk on ones hands is normal because gloves are made of leather. The point is, WHY did the browser have to be redirected AT ALL? What is achieved by requiring this shuffle to be performed? Suppose that we simply ignore [8], do no indirection, and have the original URI deliver the final response, being sensitive if you like to the preferences of the GET. What would break? Nothing. > >Note that now that we have this additional level of indirection, >we are free to respond with > >Location: http://www.w3.org/2006/03/wn/wn20/word/bank.html >or >Location: http://www.w3.org/2006/03/wn/wn20/word/bank.rdf > >at our option. And we inform our choice of response based on >what the client has put in its Accept: header on the original >HTTP GET. Once again, most deployed browsers will indicate >that users prefer human-readable forms and so the "right thing" >can be made to happen for a human clicking around in a >"Web page" browser. > >e.g.: > >Case 1: client prefers (human-readable) HTML > >-> >GET /wn20/word/bank HTTP/1.1 >Host: wordnet.princeton.edu >Accept: text/html, text/xml > ><- >303 See Other >Location: http://www.w3.org/2006/03/wn/wn20/word/bank.html > >-> >GET /2006/03/wn/wn20/word/bank.html HTTP/1.1 >Host: www.w3.org >Accept: */* > ><- >200 OK >Vary: negotiate,accept >Content-Type: text/html; charset=utf-8 > >versus > >Case 2: client prefers (machine-interpretable) RDF/XML: > >-> >GET /wn20/word/bank HTTP/1.1 >Host: wordnet.princeton.edu >Accept: application/rdf+xml > ><- >303 See Other >Location: http://www.w3.org/2006/03/wn/wn20/word/bank.rdf > >-> >GET /2006/03/wn/wn20/word/bank.rdf HTTP/1.1 >Host: www.w3.org >Accept: */* > ><- >200 OK >Content-Type: application/rdf+xml So, again, what is achieved by the indirection, in this example? BOTH kinds of query are redirected, and the selection is done by the Accept: line, and performed by the host at www.w3.org. Why could this not have been achieved by the original host at wordnet.princeton.edu, without the redirection? It could. The 303 achieves nothing except to waste transmission time. In any case, this issue raised here, involving the use of Accept: to control the GET, is not what [8] refers to. It is claimed in [8] that this mechanism is supposed to be used to distinguish between information resources and other, non-information resources: but your example makes no such distinction. RDF and HTML are both information resources. Anything that you can transmit over a network is an information resource. No such disambiguation of resource type is done in your example. > >Now, the 3 April editors' draft [11] suggests that Case 2 can be >implemented with a SPARQL query. That's plausibly a fine thing >to do but it is entirely at the server's discretion *how* to implement >a response to the request for (an RDF representation of) information >about one of our published WordNet item URIs. > > [11] <http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20060403> > >Note, too, that it is just fine for a GET on the namespace URI, e.g. >for synsets, to return a document that describes all the synsets in >the 2.0 version of WordNet: > >-> >GET /wn20/synset/ HTTP/1.1 >Host: wordnet.princeton.edu >Accept: application/rdf+xml > ><- >200 OK >Content-Location: /wn20/synset/index.rdf >Accept: application/rdf+xml > ><rdf:RDF xmlns:rdf="..." xmlns:wn20="..."> > <wn20:Synset > rdf:about="http://wordnet.princeton.edu/wn20/synset/bank-noun-1"> > <wn20:synsetContainsWordSense > rdf:resource="http://wordnet.princeton.edu/wn20/word/bank-noun-1"/> > ... > </wn20:Synset> > ... ></rdf:RDF> > >In this case no 303 redirect is needed because it is acceptable to >say that one representation of a namespace *is* an information >resource (i.e. a document). Why is it not acceptable to say that ANY representation is an information resource? And why is a word not a representation? Or, if you want to be a little more careful, why is a token of a word - which could actually be a document - not a representation of the Platonic word itself? Why, for that matter, is a URI with a word embedded into it in some systematic way - which also could be a document, as well as identify a document - not a representation of the word? > >This is how I suggest that we implement "WordNet Basic" -- no >need for publishing additional URIs; we just use an obvious >URI that already makes some "sense" in our vocabulary structure. > >We can name lots of documents that return information about >our WordNet items; e.g. we could support a "query" for all the >known word senses of "bank" used as a noun by supporting >another set of URI patterns that are similar to the names of >the word senses themselves: > >-> >GET /wn20/word/bank-sense-n >Host: wordnet.princeton.edu >Accept: text/html > ><- >200 OK > ><?xml version="1.0" encoding="utf-8"?> ><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"> ><body> ><h1>About noun senses of "Bank"</h1> >... > >In this case, the URI http://wordnet.princeton.edu/wn20/word/bank-sense-n >is not naming a WordNet item but rather is naming a document >that describes a WordNet item. > >Whether to support such "convenience" URIs (queries) rather >than an explicit SPARQL service is largely up to the service >provider to decide. But it is important that our document be >clear that any such convenience URIs are naming documents >and not items in the WordNet vocabulary. WHY is this important? And more centrally, do you really mean NAMING here, or do you mean IDENTIFYING in the sense used by the TAG? These are not the same notion. Pat > >-Ralph -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Thursday, 20 April 2006 17:37:55 UTC