RE: [WNET] URIs as primitive queries? from Booth, David (HP Software - Boston) on 2006-03-28 (public-swbp-wg@w3.org from March 2006)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Tue, 28 Mar 2006 18:26:40 -0500
To: "Mark van Assem" <mark@cs.vu.nl>
Cc: "SWBPD list" <public-swbp-wg@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C20B9242@tayexc19.americas.cpqcorp.net>
Mark,

> From: Mark van Assem [mailto:mark@cs.vu.nl] 
> . . .
> > I don't see it as naming sets of nodes.  I see it as naming a Web 
> > location where you can get some useful RDF data.  And that URI does 
> > not happen to also be an RDF node in an triple that you previously 
> > encountered.  From this point of view, it is no different 
> > in principle 
> > from most regular Web pages that just serve data.
> 
> So in principle there is nothing against this approach?

Correct.  I do not see any problem from a Web architectural point of
view.

> > I think the question is how users (and particularly 
> > software agents) 
> > would know that they could use such a URI, given that it is 
> > not an RDF 
> > node in a published triple.  URIs should normally be treated as 
> > opaque, so you normally can't assume that you can just chop off the 
> > "1/" from the end of the URI in order to get related data.
> 
> Right, that is something that an agent would need to know. 
> But such an 
> agent would also need to know about certain classes and properties to 
> achieve the same result using SPARQL.
> 
> What would your advice be: describe this possibility in the 
> coming new 
> WN draft [1] if it is something that is/might be appropriate 
> or keep it 
> out (either as something "evil" or because it still needs discussion)?

I would assume that the two main ways of using WordNet would be:

1. Starting from a word (not a URI), look up information related to that
word, which will give you URIs that are RDF nodes in triples.  In this
usage, a URI query like

    http://wordnet.princeton.edu/wn/wordsense/noun/bank/

may be handy as an easy query mechanism.  However, I don't know whether
applications would already know that they want only noun usages of
"bank".  So perhaps something like

    http://wordnet.princeton.edu/wn/wordsense/bank/

would be better.  Also, the language may need to be indicated, so
perhaps something like the following would work better:  

    http://wordnet.princeton.edu/wn/wordsense/en/bank/

2. Starting from a particular URI that is an RDF node in a triple, look
up related information.  In this case, I don't think the application
would (or should) know to deconstruct the URI in order to do a broader
query, so I don't think the above mechanism would be appropriate for
this usage.  (But please correct me if you think I'm wrong.)

BTW, one thing I notice in looking over the WordNet document[1] that you
mentioned: It seems a little odd that there are different lexical
conventions used for forming the different kinds of URIs that are used.
For example, the document shows the following NounSynset, WordSense and
Word URIs (respectively):

	http://wordnet.princeton.edu/wn20/107909067-bank-n/
	http://wordnet.princeton.edu/wn20/bank-noun-1/
	http://wordnet.princeton.edu/wn20/word-bank/

Aside from the http://wordnet.princeton.edu/wn20/ prefix and the
trailing slash, the lexical patterns for the three seem to be (in perl):

	($synsetID)-($word)-($lexGroupLetter)
	($wordP)-($lexGroupName)-($n)
	word-($word)

where
	$syssetID = synsetID pattern = [0-9]+
	$word = word pattern = [a-zA-Z_]+
	$lexGroupLetter = lexical group letter pattern = [nvasr]
	$lexGroupName = lexical group name = {noun,verb,adjective,...}
	$n = word sense number = [0-9]+

This seems odd for a few reasons:

1. If I've understook correctly, in a URI like

	http://wordnet.princeton.edu/wn20/107909067-bank-n/

the "-bank-n" that is tacked on after the synset ID is redundant,
presumably provided as a convenient reminder to a human reader.  I think
providing this human convenience is a good idea (very helpful in
debugging), but I'm also wondering: shouldn't a particular word sense be
enough to unambiguously identify a particular synset?  Is the synset ID
really needed?  Couldn't the above URI be more conveniently constructed
as:

	http://wordnet.princeton.edu/wn20/synset/bank-noun-1/

i.e., "synset/bank-noun-1" acts as a unique synset identifier.  Would
that work or did I misunderstand something?  It would be nice to get rid
of the arbitrary synset ID numbers if they are not needed.

2. Sometimes "noun", "verb", etc., are abbreviated as "n", "v", etc. and
sometimes they are spelled out.

3. The lexical components are not always in the same place.  I would
have expected something like:

	http://wordnet.princeton.edu/wn20/synset/107909067-bank-n/
	(or http://wordnet.princeton.edu/wn20/synset/bank-noun-1/ )
	http://wordnet.princeton.edu/wn20/wordsense/bank-noun-1/
	http://wordnet.princeton.edu/wn20/word/bank/

Of course, you may have other compelling reasons for constructing the
URIs as you have already shown that I do not know about.

David Booth

Reference
> [1]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion.html
Received on Tuesday, 28 March 2006 23:29:36 UTC