on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues]

Thanks, Steve, for giving me the perfect opening for a thread
on distinguishing "documents" from "words" -- i.e. "information
resources" from terms in WordNet.

This is really what Best Practice Recipes for Publishing RDF
Vocabularies [5] is all about, so it's good to consider it in
light of the specific example of WordNet.

   [5] http://www.w3.org/TR/swbp-vocab-pub/

At 08:04 PM 4/18/2006 +0200, Steve Pepper wrote:
...
> [1] http://wordnet.princeton.edu/wn20/synset-bank-noun-1
> [2] http://wordnet.princeton.edu/wn20/wordsense-bank-noun-1
> [3] http://wordnet.princeton.edu/wn20/word-bank
> [4] http://wordnet.princeton.edu/wn20/schema-participleOf

...

>I'm interested to know what these URLs will resolve to. I
>would like to see them resolve to the *human-readable* content
>of WordNet

I agree -- but I want *both* human-readable *and* machine-
interpretable content to be served in response to requests
for those URIs.  [5] tells us how to do this in a way that is
consistent with our best current understanding of Web Architecture.

>Why human-readable content and not a CBD [1]?

I'll rephrase that as "why human-readable content for
humans and CBD for machines?" :)

The WordNet database [6] provides a system in which

   "English nouns, verbs, adjectives and adverbs are organized into
    synonym sets, each representing one underlying lexical concept."
   -- [6]

These items -- the nouns, verbs, adjectives, and adverbs --
are the resources we want to describe.  The names we
give to them for purposes of describing them are important
but somewhat arbitrary (see previous threads, most recent
at [7]).  The names could, as you wrote, just all be numeric.

The important distinction that Web Architecture makes [8]
is that the items hereby named in our WordNet vocabulary
are *not* themselves what the Web now calls "information
resources" [9].

What we want to accomplish -- in particular by choosing names
that begin with "http:" -- is to leverage the deployed Web to provide
us exactly what you are asking for: human-readable content
that (we hope) describes these items *as well as* other content
types that are optimized for non-humans.  There has been a long
and arduous discussion (see [10]) on how to use http: URIs
to accomplish this.  The result of that discussion has informed
[5] and shows us now a way to get the Web to resolve a name
for a WordNet item to either human-readable content or
machine-interpretable content according to preferences set
by the client issuing the HTTP GET.

To be specific, [8] tells us that the URIs we choose for each of
the WordNet synsets, word senses, and words MUST be served
with a 303 See Other response.  The server implementor then
gets to choose *different* URIs to name the content that will
describe the WordNet item.  The choice of these other URIs,
which *do* name "information resources" is somewhat
arbitrary -- and does not, I think, need to be specified in our
Working Draft.

   [6] http://wordnet.princeton.edu/
   [7] <http://lists.w3.org/Archives/Public/public-swbp-wg/2006Apr/thread.html#msg40>
   [8] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
   [9] http://lists.w3.org/Archives/Public/www-tag/2003Jul/0377.html
   [10] http://www.w3.org/2001/tag/issues.html#httpRange-14


So, for example, an HTTP GET on

http://wordnet.princeton.edu/wn20/word/bank

MUST NOT return 200 OK but rather something like

303 See Other
Location: http://www.w3.org/2006/03/wn/wn20/word/bank

The client receiving 303 See Other is free to ask again for
the "redirected" URI and there the response can (SHOULD)
be 200 OK.

This actually fits quite transparently into current deployed
Web infrastructure; most (all?) browsers currently treat a
303 response as a redirection and proceed to issue another
request for the "redirected" resource, displaying the final result.

Note that now that we have this additional level of indirection,
we are free to respond with

Location: http://www.w3.org/2006/03/wn/wn20/word/bank.html
or
Location: http://www.w3.org/2006/03/wn/wn20/word/bank.rdf

at our option.  And we inform our choice of response based on
what the client has put in its Accept: header on the original
HTTP GET.  Once again, most deployed browsers will indicate
that users prefer human-readable forms and so the "right thing"
can be made to happen for a human clicking around in a
"Web page" browser.

e.g.:

Case 1: client prefers (human-readable) HTML

->
GET /wn20/word/bank HTTP/1.1
Host: wordnet.princeton.edu
Accept: text/html, text/xml

<-
303 See Other
Location: http://www.w3.org/2006/03/wn/wn20/word/bank.html

->
GET /2006/03/wn/wn20/word/bank.html HTTP/1.1
Host: www.w3.org
Accept: */*

<-
200 OK
Vary: negotiate,accept
Content-Type: text/html; charset=utf-8

versus

Case 2: client prefers (machine-interpretable) RDF/XML:

->
GET /wn20/word/bank HTTP/1.1
Host: wordnet.princeton.edu
Accept: application/rdf+xml

<-
303 See Other
Location: http://www.w3.org/2006/03/wn/wn20/word/bank.rdf

->
GET /2006/03/wn/wn20/word/bank.rdf HTTP/1.1
Host: www.w3.org
Accept: */*

<-
200 OK
Content-Type: application/rdf+xml

Now, the 3 April editors' draft [11] suggests that Case 2 can be
implemented with a SPARQL query.  That's plausibly a fine thing
to do but it is entirely at the server's discretion *how* to implement
a response to the request for (an RDF representation of) information
about one of our published WordNet item URIs.

   [11] <http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20060403>

Note, too, that it is just fine for a GET on the namespace URI, e.g.
for synsets, to return a document that describes all the synsets in
the 2.0 version of WordNet:

->
GET /wn20/synset/ HTTP/1.1
Host: wordnet.princeton.edu
Accept: application/rdf+xml

<-
200 OK
Content-Location: /wn20/synset/index.rdf
Accept: application/rdf+xml

<rdf:RDF xmlns:rdf="..." xmlns:wn20="...">
 <wn20:Synset
   rdf:about="http://wordnet.princeton.edu/wn20/synset/bank-noun-1">
   <wn20:synsetContainsWordSense
     rdf:resource="http://wordnet.princeton.edu/wn20/word/bank-noun-1"/>
   ...
 </wn20:Synset>
 ...
</rdf:RDF>

In this case no 303 redirect is needed because it is acceptable to
say that one representation of a namespace *is* an information
resource (i.e. a document).

This is how I suggest that we implement "WordNet Basic" -- no
need for publishing additional URIs; we just use an obvious
URI that already makes some "sense" in our vocabulary structure.

We can name lots of documents that return information about
our WordNet items; e.g. we could support a "query" for all the
known word senses of "bank" used as a noun by supporting
another set of URI patterns that are similar to the names of
the word senses themselves:

->
GET /wn20/word/bank-sense-n
Host: wordnet.princeton.edu
Accept: text/html

<-
200 OK

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
<body>
<h1>About noun senses of "Bank"</h1>
...

In this case, the URI http://wordnet.princeton.edu/wn20/word/bank-sense-n
is not naming a WordNet item but rather is naming a document
that describes a WordNet item.

Whether to support such "convenience" URIs (queries) rather
than an explicit SPARQL service is largely up to the service
provider to decide.  But it is important that our document be
clear that any such convenience URIs are naming documents
and not items in the WordNet vocabulary.

-Ralph

Received on Wednesday, 19 April 2006 17:40:38 UTC