W3C home > Mailing lists > Public > public-swbp-wg@w3.org > April 2006

[WNET] new proposal WN URIs and related issues

From: Mark van Assem <mark@cs.vu.nl>
Date: Tue, 18 Apr 2006 19:05:43 +0300
Message-ID: <44450E57.20801@cs.vu.nl>
To: SWBPD list <public-swbp-wg@w3.org>
CC: Jan Wielemaker <wielemak@science.uva.nl>, "Ralph R. Swick" <swick@w3.org>, Guus Schreiber <guus@few.vu.nl>


Dear all,

I have been having a private discussion with Ralph Swick and Jan 
Wielemaker concerning the WN URIs. On Guus' advice I'd like to 
summarize/replay part of this discussion to obtain comments on our 
conclusions. Ralph and Jan, please correct and/or add to this mail.

If you have any comments on these issues, we invite you to respond. Our 
target is to finish a new Draft for the next SWBP WG Telecon on 24th of 
April and to propose it as a First Working Draft. Only minor changes are 
expected apart from the issues below.

Until now the URIs for WN in the draft [1] look like this:

- http://wordnet.princeton.edu/wn20/synset/bank/noun/1/
- http://wordnet.princeton.edu/wn20/wordsense/bank/noun/1/
- http://wordnet.princeton.edu/wn20/word/bank/
- http://wordnet.princeton.edu/wn20/schema/participleOf/

We would like to change this to

- http://wordnet.princeton.edu/wn20/synset/bank-noun-1
- http://wordnet.princeton.edu/wn20/wordsense/bank-noun-1
- http://wordnet.princeton.edu/wn20/word/bank
- http://wordnet.princeton.edu/wn20/schema/participleOf

With the namespaces:
- wn20synset = http://wordnet.princeton.edu/wn20/synset/
- wn20wordsense = http://wordnet.princeton.edu/wn20/wordsense/
- wn20word = http://wordnet.princeton.edu/wn20/word/
- wn20schema = http://wordnet.princeton.edu/wn20/schema/


This actually means reverting to an earlier proposal [3], but on which 
was proposed to change '-' into slashes [4].

The reasons for the changes to the current proposal are:

1) the trailing slash causes problems in using properties, e.g. 
<wn:synsetId/>value</wn:synsetId/> results in a parsing error.

2) because of the use of slashes in the 'local part' of the URIs (e.g. 
bank/noun/1), it becomes impossible to use the ns:localId notation 
(QNames). Slashes are not allowed within localId. Instead then only 
entities could be used to define instances, e.g.

	<wn20schema:NounSynset rdf:about="&wn20synset;bank/noun/1">
	     <wn20schema:synsetId>12345</wn20schema:synsetId>
	     ...
	</wnschema:Nounsynset>

This is not really inhibiting (only abit awkard maybe) but it does 
inhibit in the next point.

3) it is impossible to recast WN Synsets as properties, e.g. to use WN 
VerbSynsets as properties:

	<rdf:Description rdf:about="&wn20synset;vase/noun/1">
	 	<&wn20synset;above/verb/1 rdf:resource="&wn20synset;table/noun/1" />
	</rdf:Description>

is impossible. For attributes, if I understand correctly, only the 
ns:localId notation is allowed in RDF/XML  (so writing out the complete 
URI would not solve this). But because of (2) it is not possible to 
write down <wn20synset:above/verb/1 
rdf:resource="&wn20synset;table/noun/1" />

With the new proposals these issues do not occur, because the localIds 
do not contain slashes.

Note that the URIs for instances of Synsets, WordSenses and Words, as 
well as the URIs of classes and properties are in both proposals 
effectively in different namespaces (although there is a relationship 
between them). I am not sure this is a good idea after all, but it at 
least is a simple way of preventing URI clashes, e.g. between the word 
antonym and the property antonym. Another option is to create property 
names that definately do not conflict with words, e.g. by introducing a 
prefix. Then we can put everything in one namespace. E.g. with URIs

- http://wordnet.princeton.edu/wn20/synset-bank-noun-1
- http://wordnet.princeton.edu/wn20/wordsense-bank-noun-1
- http://wordnet.princeton.edu/wn20/word-bank
- http://wordnet.princeton.edu/wn20/schema-participleOf


I am seeking input to decide between these two options.

Ralph is working on setting up a server @ W3C to return CBDs on HTTP 
GETs for the WN URIs, so that the Princeton based URIs in [1] needn't 
404. The proposal is to remove the references to Princeton in [1] for 
the time being, with notice that the aim is to go from W3C based URIs to 
Princeton based in the future. In that way the document is more usable 
for current purposes (namely providing a working online WN version and a 
readable draft that describes it and allows direct examination of the 
sources).

As an aside, it turned out that the Recipes in [2] do not cover exactly 
the WN case, namely serving a large set of (small) files (which is a 
straightforward way to implement CBDs). We actually need a variant of 
Recipe 2 or 5 where the whole vocabulary is not in one RDF file.

Thanks to Jan and Ralph for extensive discussions on these topics.

Kind regards,
Mark.


[1]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion
[2]http://www.w3.org/TR/swbp-vocab-pub/
[3]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20060202
[4]http://lists.w3.org/Archives/Public/public-swbp-wg/2006Feb/0087
Received on Tuesday, 18 April 2006 16:05:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:17:21 GMT