Re: WNet review

On Wednesday 22 February 2006 17:43, Mark van Assem wrote:
> Hi Kjetill,
>
> > I believe it would be customary to not use a trailing slash on the
> > fragment identifier, so you might find it preferable to use
> > http://wordnet.princeton.edu/wn#107909067-bank-n
>
> Ok, will change that in the next version!

Good! :-)

>
> > Why not e.g.:
> > http://wordnet.princeton.edu/wn/107909067/bank/n/
> > or another example, including a wn version:
> > http://wordnet.princeton.edu/wn/2.0/bank/noun/1/
>
> I personally feel that the currently proposed scheme is more clear
> because it typographically separates the "local ID" from the
> "namespace" part of the URI, which helps humans to read the URIs. Is
> this way of composing slash URIs considered counter-intuitive by most
> people?

Hmmm, I don't know. I think many have a intuitive "file system" image, 
and would decompose the URI
http://wordnet.princeton.edu/wn/2.0/bank/noun#1
to mean "the word senses of bank", "then just the nouns of bank" and the 
"first noun sense". That's what I would do. 


> > Or perhaps you'd like a single retrievable resource for that term,
> > it intuitively makes sense to use a fragment identifier for
> > different meanings, so perhaps reintroduce the hash again:
> > http://wordnet.princeton.edu/wn/2.0/bank/noun#1
>
> If I'm not mistaken that would partially re-introduce the problem
> we're trying to circumvent by not using hashes at all: all the
> different noun senses of bank would be returned on an HTTP GET, not
> just #1. 

Ah, yes it would. 

However, your argument in the document is that: "The disadvantage of 
hash URIs is that when a HTTP GET is done [...] the browser will return 
the whole document", which is a valid argument for not returning the 
whole wordnet database on that GET. It is not, however, a complete 
argument against returning a set of senses.

> The benefit of the current proposal is that you can ask for 
> both a specific WordSense (bank-noun-1) or a set of senses (query for
> NounWordSense which have a Word with wn:lexicalForm "bank"). The
> former query is not possible with the hash URIs.

Right! I'm all for easily-parseable-by-humans URIs, but I would not put 
too much into them. An agent would have to be aware of this 
specification to know that it would be getting the WordSense, and if I 
understand the "URI Opacity" practice of Webarch correctly, that's not 
necessarily a good thing.

I'm much more inclined to focus on returning a "reasonable chunk" for 
any HTTP GET. Unintended retrievals of the whole WN db should be 
avoided, but a single WordSense seems like an unreasonable small chunk 
to me. 

It boils down to what would be nice to get by a simple GET and what 
would require a (SPARQL) query, I don't think agents should infer what 
will be returned based on the URI itself.

For example, my RDF::Scutter's default is what's recommended for robots, 
i.e. only send one HTTP request against one server per minute, so 
getting all the bank noun senses would take ten minutes. Of course, one 
might not want the scutter to suck up something as big as WN 
unattended, but it would slow down scuttering immensely if it would 
have to visit each wordsense if WN was used in a context where 
different senses of words was discussed. There is not that much data 
for a set of senses, it seems worth getting that much once you send a 
GET, and therefore, that seems like a reasonable chunk to me.

Best,

Kjetil
-- 
Kjetil Kjernsmo
Information Systems Developer
Opera Software ASA

Received on Monday, 27 February 2006 16:37:15 UTC