Re: grounding wordnet in the web

On Wed, 2006-02-22 at 15:20 +0100, Mark van Assem wrote:
> Hi Dan,
> 
> > But many of the URIs in the wn-conversion document don't work. I get 
> > 404 @ http://wordnet.princeton.edu/wn20/bank-noun-1/
> 
> The URIs don't work yet because we do not have a place yet to host WN 
> RDF/OWL. The idea is that Princeton implements server rewrite rules so 
> that HTTP GETs are redirected to a server with the actual data run by 
> an institute from our community to take that burden off of Princeton. 
> Also because the correct statements to be returned (our proposal is 
> the Concise Bounded Description) should be computed. Additionally, it 
> allows us to introduce a "latest version URL":
> 
> http://wordnet.princeton.edu/wn
> 
> that always redirects to the newest version (at another server), much 
> like the latest version URLs of W3C documents.
> 
> Before we seek assistance to actually implement this we would like 
> some feedback on this approach. What is your opinion on this?

Hmm... it seems fairly reasonable; but I don't recommend computing
the CBD on-demand.  I expect "baking" will be more manageable
than "frying".

cf http://www.aaronsw.com/weblog/000404

I wonder if the demand will really be so high that a simple web
server with a pile of static files won't be able to handle it easily.

Rather than putting big iron behind this service, I suggest you
throttle it. Aggressively advocate that tools that know that
they will rely on access to wordnet data in advance cache the
data they need, and if anybody is making, say, more than 100
requests per hour, start returning "401 unauthorized; get a cache"
and if the server gets busy, just return "5xx I'm too busy; you
might try the _bittorrent bulk download_".

If you have big iron to throw at the problem, you might as
well do a full SPARQL service, and not just CBDs.
See http://esw.w3.org/topic/DawgShows for several examples.

In the cwm-related research work, we have been working on
structures for navigating big databases; when you GET the
database resource, the idea is that it comes back with
"this is a summary of the database, not the whole contents;
you can query it with SPARQL at <endpoint-xyz>".
I can't advocate that as a tried-and-true best practice yet,
but I hope to see something like it standardized eventually.


> > and I'm quite surprised to see:
> > 
> >   The first step in using this conversion is selecting the
> >   appropriate version to download.
> > 
> > Download? Can't I just use it there in the web?
> 
> As for the download statement: in that introductory "primer" part of 
> the document we would like to describe as straightforward and simple 
> as possible how one could start to work with WordNet RDF/OWL (the 
> minimum amount of text for people already familiar with WN and 
> RDF/OWL). To keep it simple we only tell the story there for offline 
> use.  We could add something along the lines of "one can also query 
> WordNet online..." and provide a reference to the more elaborate 
> online/offline section [1]. Would that be satisfactory?

Well, no, not really. The simplest use doesn't
involve downloading anything. I don't think the "best practices"
WG should give the impression that downloading is the simplest case.
The simplest case is that I just dereference the URIs of whatever
terms I'm interested in.

If you're only going to document usage that involves downloading,
please say that it's due to some sort of limitation, a la:

  Ordinary lookup[webarch 3.1] of wordnet terms in the Web is in
  progress but not yet available; for now, we suggest you download
  the data in bulk ...

  [webarch 3.1]
http://www.w3.org/TR/2004/REC-webarch-20041215/#dereference-uri

> 
> Cheers,
> Mark.
> 
> [1]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion.html#querying
> 
> 
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Thursday, 23 February 2006 15:46:39 UTC