Re: [WNET] URIs as primitive queries? from Mark van Assem on 2006-03-29 (public-swbp-wg@w3.org from March 2006)

From: Mark van Assem <mark@cs.vu.nl>
Date: Wed, 29 Mar 2006 11:45:30 +0200
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
CC: SWBPD list <public-swbp-wg@w3.org>
Message-ID: <442A573A.4040300@cs.vu.nl>
Hi David,

Thanks for looking into this.

> may be handy as an easy query mechanism.  However, I don't know whether
> applications would already know that they want only noun usages of
> "bank".  So perhaps something like
> 
>     http://wordnet.princeton.edu/wn/wordsense/bank/
> 

I see what you mean. There's one problem left: this approach would 
give a URI clash for all the sub-types of wordsenses:

      http://wordnet.princeton.edu/wn/wordsense/noun/

can refer to all NounWordSenses, or to the WordSenses with a Word with 
the lexical label "noun". Some kind of prefix could be introduced to 
solve this e.g. "type-noun".

> would be better.  Also, the language may need to be indicated, so
> perhaps something like the following would work better:  
> 
>     http://wordnet.princeton.edu/wn/wordsense/en/bank/

Why is that necessary? We already know that WN is in "en-US". URI 
clashes with other WN's can be easily avoided by having another base URI.

> 2. Starting from a particular URI that is an RDF node in a triple, look
> up related information.  In this case, I don't think the application
> would (or should) know to deconstruct the URI in order to do a broader
> query, so I don't think the above mechanism would be appropriate for
> this usage.  (But please correct me if you think I'm wrong.)

Well, I think an application should not rely on this. But in practice 
it would probably be programmed to do so if it gets the job done. I 
think this is slippery terrain, but I can't decide either way.

  > BTW, one thing I notice in looking over the WordNet document[1] 
that you
> mentioned: It seems a little odd that there are different lexical
> conventions used for forming the different kinds of URIs that are used.

I am revising this  proposal in response to points raised by Kjetil 
[2]. The current idea is going towards something like

   http://wordnet.princeton.edu/wn20/synset/type-noun/107909067-bank/
	http://wordnet.princeton.edu/wn20/wordsense/type-noun/bank/1
	http://wordnet.princeton.edu/wn20/word/bank/
	http://wordnet.princeton.edu/wn20/schema/participleOf


> For example, the document shows the following NounSynset, WordSense and
> Word URIs (respectively):
> 
> 	http://wordnet.princeton.edu/wn20/107909067-bank-n/
> 	http://wordnet.princeton.edu/wn20/bank-noun-1/
> 	http://wordnet.princeton.edu/wn20/word-bank/

<snip>

> This seems odd for a few reasons:
> 
> 1. If I've understook correctly, in a URI like
> 
> 	http://wordnet.princeton.edu/wn20/107909067-bank-n/
> 
> the "-bank-n" that is tacked on after the synset ID is redundant,
> presumably provided as a convenient reminder to a human reader.  I think

Correct.

> providing this human convenience is a good idea (very helpful in
> debugging), but I'm also wondering: shouldn't a particular word sense be
> enough to unambiguously identify a particular synset?  Is the synset ID
> really needed?  Couldn't the above URI be more conveniently constructed
> as:
> 
> 	http://wordnet.princeton.edu/wn20/synset/bank-noun-1/

You mean that here you use the information of one of a synset's 
wordsenses to unambiguously identify the synset? This is possible 
because a WordSense belongs to exactly one Synset. However, I'd change 
it to:

	http://wordnet.princeton.edu/wn20/synset/bank/noun/1/

because this allows more flexibility in the possible URIs that can be 
used as queries, e.g.

	http://wordnet.princeton.edu/wn20/synset/bank/noun/

returns all nounsynsets, and:

	http://wordnet.princeton.edu/wn20/synset/bank/

returns all synsets with a wordsense that has the word "bank".

> i.e., "synset/bank-noun-1" acts as a unique synset identifier.  Would
> that work or did I misunderstand something?  It would be nice to get rid
> of the arbitrary synset ID numbers if they are not needed.

Good point. Not sure some users would like the ID to stay because they 
use that? Note that such a use would imply parsing the URI and 
inferring information from the URI.

> 2. Sometimes "noun", "verb", etc., are abbreviated as "n", "v", etc. and
> sometimes they are spelled out.

Yep, I agree. That bothered me also, so the new proposal always spells 
them out.

OK, so the new proposal would be:

	http://wordnet.princeton.edu/wn20/synset/bank/type-noun/1/
	http://wordnet.princeton.edu/wn20/wordsense/type-noun/bank/1/
	http://wordnet.princeton.edu/wn20/word/bank/
	http://wordnet.princeton.edu/wn20/schema/participleOf/

(the type-prefix is only required for wordsenses, but it is probably 
more consistent to also use it in the synset URIs).

I will implement this proposal for URIs for RDF nodes into the new 
draft (should be done monday before the telecon).

I'd like to leave the other discussion (the mapping between URIs that 
do not correspond to concrete RDF nodes but rather to sets of them) 
for later. The reason for this is that I'd like to get a new version 
out by monday so that the draft may reach First WG Draft status before 
the end of the charter. If I am correct the issue is orthogonal to the 
main focus of the Note, namely a correct conversion of WN to RDF/OWL. 
I will describe this as a discussion issue in the new version of the 
draft [1]. Could you live with that?

With regards,
Mark.

[1]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion.html
[2]http://lists.w3.org/Archives/Public/public-swbp-wg/2006Mar/0076

-- 
  Mark F.J. van Assem - Vrije Universiteit Amsterdam
        markREMOVE@cs.vu.nl - http://www.cs.vu.nl/~mark
Received on Wednesday, 29 March 2006 09:45:42 UTC