- From: Ralph R. Swick <swick@w3.org>
- Date: Wed, 19 Apr 2006 13:40:17 -0400
- To: Steve Pepper <pepper@ontopia.net>, Mark van Assem <mark@cs.vu.nl>,
- Cc: <public-swbp-wg@w3.org>>, "Guus Schreiber" <guus@few.vu.nl>
Thanks, Steve, for giving me the perfect opening for a thread on distinguishing "documents" from "words" -- i.e. "information resources" from terms in WordNet. This is really what Best Practice Recipes for Publishing RDF Vocabularies [5] is all about, so it's good to consider it in light of the specific example of WordNet. [5] http://www.w3.org/TR/swbp-vocab-pub/ At 08:04 PM 4/18/2006 +0200, Steve Pepper wrote: ... > [1] http://wordnet.princeton.edu/wn20/synset-bank-noun-1 > [2] http://wordnet.princeton.edu/wn20/wordsense-bank-noun-1 > [3] http://wordnet.princeton.edu/wn20/word-bank > [4] http://wordnet.princeton.edu/wn20/schema-participleOf ... >I'm interested to know what these URLs will resolve to. I >would like to see them resolve to the *human-readable* content >of WordNet I agree -- but I want *both* human-readable *and* machine- interpretable content to be served in response to requests for those URIs. [5] tells us how to do this in a way that is consistent with our best current understanding of Web Architecture. >Why human-readable content and not a CBD [1]? I'll rephrase that as "why human-readable content for humans and CBD for machines?" :) The WordNet database [6] provides a system in which "English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept." -- [6] These items -- the nouns, verbs, adjectives, and adverbs -- are the resources we want to describe. The names we give to them for purposes of describing them are important but somewhat arbitrary (see previous threads, most recent at [7]). The names could, as you wrote, just all be numeric. The important distinction that Web Architecture makes [8] is that the items hereby named in our WordNet vocabulary are *not* themselves what the Web now calls "information resources" [9]. What we want to accomplish -- in particular by choosing names that begin with "http:" -- is to leverage the deployed Web to provide us exactly what you are asking for: human-readable content that (we hope) describes these items *as well as* other content types that are optimized for non-humans. There has been a long and arduous discussion (see [10]) on how to use http: URIs to accomplish this. The result of that discussion has informed [5] and shows us now a way to get the Web to resolve a name for a WordNet item to either human-readable content or machine-interpretable content according to preferences set by the client issuing the HTTP GET. To be specific, [8] tells us that the URIs we choose for each of the WordNet synsets, word senses, and words MUST be served with a 303 See Other response. The server implementor then gets to choose *different* URIs to name the content that will describe the WordNet item. The choice of these other URIs, which *do* name "information resources" is somewhat arbitrary -- and does not, I think, need to be specified in our Working Draft. [6] http://wordnet.princeton.edu/ [7] <http://lists.w3.org/Archives/Public/public-swbp-wg/2006Apr/thread.html#msg40> [8] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html [9] http://lists.w3.org/Archives/Public/www-tag/2003Jul/0377.html [10] http://www.w3.org/2001/tag/issues.html#httpRange-14 So, for example, an HTTP GET on http://wordnet.princeton.edu/wn20/word/bank MUST NOT return 200 OK but rather something like 303 See Other Location: http://www.w3.org/2006/03/wn/wn20/word/bank The client receiving 303 See Other is free to ask again for the "redirected" URI and there the response can (SHOULD) be 200 OK. This actually fits quite transparently into current deployed Web infrastructure; most (all?) browsers currently treat a 303 response as a redirection and proceed to issue another request for the "redirected" resource, displaying the final result. Note that now that we have this additional level of indirection, we are free to respond with Location: http://www.w3.org/2006/03/wn/wn20/word/bank.html or Location: http://www.w3.org/2006/03/wn/wn20/word/bank.rdf at our option. And we inform our choice of response based on what the client has put in its Accept: header on the original HTTP GET. Once again, most deployed browsers will indicate that users prefer human-readable forms and so the "right thing" can be made to happen for a human clicking around in a "Web page" browser. e.g.: Case 1: client prefers (human-readable) HTML -> GET /wn20/word/bank HTTP/1.1 Host: wordnet.princeton.edu Accept: text/html, text/xml <- 303 See Other Location: http://www.w3.org/2006/03/wn/wn20/word/bank.html -> GET /2006/03/wn/wn20/word/bank.html HTTP/1.1 Host: www.w3.org Accept: */* <- 200 OK Vary: negotiate,accept Content-Type: text/html; charset=utf-8 versus Case 2: client prefers (machine-interpretable) RDF/XML: -> GET /wn20/word/bank HTTP/1.1 Host: wordnet.princeton.edu Accept: application/rdf+xml <- 303 See Other Location: http://www.w3.org/2006/03/wn/wn20/word/bank.rdf -> GET /2006/03/wn/wn20/word/bank.rdf HTTP/1.1 Host: www.w3.org Accept: */* <- 200 OK Content-Type: application/rdf+xml Now, the 3 April editors' draft [11] suggests that Case 2 can be implemented with a SPARQL query. That's plausibly a fine thing to do but it is entirely at the server's discretion *how* to implement a response to the request for (an RDF representation of) information about one of our published WordNet item URIs. [11] <http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20060403> Note, too, that it is just fine for a GET on the namespace URI, e.g. for synsets, to return a document that describes all the synsets in the 2.0 version of WordNet: -> GET /wn20/synset/ HTTP/1.1 Host: wordnet.princeton.edu Accept: application/rdf+xml <- 200 OK Content-Location: /wn20/synset/index.rdf Accept: application/rdf+xml <rdf:RDF xmlns:rdf="..." xmlns:wn20="..."> <wn20:Synset rdf:about="http://wordnet.princeton.edu/wn20/synset/bank-noun-1"> <wn20:synsetContainsWordSense rdf:resource="http://wordnet.princeton.edu/wn20/word/bank-noun-1"/> ... </wn20:Synset> ... </rdf:RDF> In this case no 303 redirect is needed because it is acceptable to say that one representation of a namespace *is* an information resource (i.e. a document). This is how I suggest that we implement "WordNet Basic" -- no need for publishing additional URIs; we just use an obvious URI that already makes some "sense" in our vocabulary structure. We can name lots of documents that return information about our WordNet items; e.g. we could support a "query" for all the known word senses of "bank" used as a noun by supporting another set of URI patterns that are similar to the names of the word senses themselves: -> GET /wn20/word/bank-sense-n Host: wordnet.princeton.edu Accept: text/html <- 200 OK <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"> <body> <h1>About noun senses of "Bank"</h1> ... In this case, the URI http://wordnet.princeton.edu/wn20/word/bank-sense-n is not naming a WordNet item but rather is naming a document that describes a WordNet item. Whether to support such "convenience" URIs (queries) rather than an explicit SPARQL service is largely up to the service provider to decide. But it is important that our document be clear that any such convenience URIs are naming documents and not items in the WordNet vocabulary. -Ralph
Received on Wednesday, 19 April 2006 17:40:38 UTC