Re: [WN] Fwd: WordNet Namespace

Mark, Aldo,

Based on the reply of Peter, Jeremy and my previous postings, there are 
two different size-related issues.
I was only talking about the size of the total RDF if it is to be loaded 
into a single triple store.  I think this problem is mainly an editorial 
one: the draft should contain some argument along the lines of  "yes, it 
is bigger than the other conversions but a) it is complete and b) it 
uses URIs for word senses which make them first class citizens wich is 
important because ...."

The second size-related issue is the one brought up by Jeremy and 
relates to the size of the chunk of data that is returned by the 
Princeton webserver if one of the URIs is resolved. This is directly 
related to the hash vs slash URI discussion (see Jeremy's post), but 
also to the question of how you want Princeton to map the different RDF 
files into the URI namespace you propose.  Both are currently not 
discussed in the document (editorial) but I think their is also a 
technical issue here: the way the files are currently split up, even 
with Alistair's apache cookbook it is not trivial to make sure that 
resolving http://wordnet.princeton.edu/rdf/entity actually returns all 
triples concerning that resource (and only those triples, not the >1M 
other triples).

Jacco

Jeremy Carroll wrote:

> Let's suppose we have 150MB of data.
>
> If we have http://wordnet.princeton.edu/rdf#entity then we have one 
> file of 150MB. If we want to look up this URI, we have to download 
> 150MB from http://wordnet.princeton.edu/rdf and then parse it and find 
> the triples concerning http://wordnet.princeton.edu/rdf#entity
>
> If we have http://wordnet.princeton.edu/rdf/entity then we may have 
> say 50,000 files each of 4KB (notice this is somewhat more in total, 
> say 200MB).
>
> If we want to lookup http://wordnet.princeton.edu/rdf/entity then we 
> download 4KB.
>
> The latter is much more practical.

Received on Wednesday, 14 December 2005 12:12:55 UTC