Re: [WNET] new proposal WN URIs and related issues

At 07:05 PM 4/18/2006 +0300, Mark van Assem wrote:
...
>The reasons for the changes to the current proposal are:
>
>1) the trailing slash causes problems in using properties, e.g. <wn:synsetId/>value</wn:synsetId/> results in a parsing error.

The only properties being defined are in the schema, which I think
deserves its own namespace separate from the WordNet instance data.

I do think most of the trailing '/'s were a confusing choice and I was
about to write mail to the list proposing which ones should be dropped,
so thank you for getting there before me :)

>2) because of the use of slashes in the 'local part' of the URIs (e.g. bank/noun/1), it becomes impossible to use the ns:localId notation (QNames). Slashes are not allowed within localId. Instead then only entities could be used to define instances, e.g.
...
>This is not really inhibiting (only abit awkard maybe) but it does inhibit in the next point.

It turns out that custom entities can't be done within an HTML document
so I would consider it a show-stopper to choose between otherwise
technically similar options the ones that don't fit as QNames [or CURIEs :) ].

I'd like authors of HTML documents to be able to include RDF that uses
our WordNet resources without unnecessary additional aggravation.

>3) it is impossible to recast WN Synsets as properties, e.g. to use WN VerbSynsets as properties:
>
>        <rdf:Description rdf:about="&wn20synset;vase/noun/1">
>                <&wn20synset;above/verb/1 rdf:resource="&wn20synset;table/noun/1" />
>        </rdf:Description>

This kind of usage is somewhat beyond what you have proposed
in the current editors' draft.  It implies that there is some sort of
linguistic behavior of the WordNet data such that, e.g., one might
be able to say that VerbSynset is a subClass of rdf:Property.

Do you believe that WordNet has this characteristic?  Do you believe
that Princeton would agree that this characteristic holds?

I suspect this linguistic characteristic would only apply to
VerbSynset, not to any of the other classes.

Either way, I agree that making this possibility hard to implement
by choosing names with syntactic restrictions should be avoided
if it is not otherwise inconvenient.

>is impossible. For attributes, if I understand correctly, only the ns:localId notation is allowed in RDF/XML  (so writing out the complete URI would not solve this).

XML attribute _names_ may not include '/', correct.

...
>Note that the URIs for instances of Synsets, WordSenses and Words, as well as the URIs of classes and properties are in both proposals effectively in different namespaces (although there is a relationship between them). I am not sure this is a good idea after all,

I think it is a good idea to give separate namespaces to the
terms used to model the data and the data itself.  When we
get to writing down more best practices for vocabulary
management, I anticipate that we would find it advantageous
to be able to separately version the modelling terminology
and the instance data.

>... Another option is to create property names that definately do not conflict with words, e.g. by introducing a prefix. Then we can put everything in one namespace. E.g. with URIs
>
>- http://wordnet.princeton.edu/wn20/synset-bank-noun-1
>- http://wordnet.princeton.edu/wn20/wordsense-bank-noun-1
>- http://wordnet.princeton.edu/wn20/word-bank
>- http://wordnet.princeton.edu/wn20/schema-participleOf

If we collapse everything down to one namespace then all of this
prefix information has to be repeated in each use of every term in
WordNet.  This doesn't feel convenient to me -- and will upset
those who care about the size of XML documents they have to
generate (I'm not particularly one of the latter, but they do tend
to be vocal.)

I lean toward four namespaces:

...wn20/schema/
...wn20/synset/
...wn20/wordsense/
...wn20/word/

Making the synset namespace be separate gives us a nice
easy way to refer to "WordNet Basic" -- it's just the namespace
name of the synset portion.  There may be more triples in
this synset part of the data than the current draft defines for Basic
but I expect not enough more to really upset users.  Avoiding
a requirement to have separate names for "basic" and "full"
feels good; the application simply chooses to fetch only the parts
of the vocabulary that it needs.

wordsense/ and word/ could be collapsed to just one namespace,
particularly if those sets of data are always expected to be used
together:

...wn20/word/

with word instances like

wnword:bank
wnword:read_write_memory

and word senses like

wnword:sense-bank-noun-1
or
wnword:sense-bank-n-1
or perhaps
wnword:bank-sense-n-1

(I'm somewhat taken by the %lexform% "-sense-" %type% "-" %num%
form and note that "-sense" can be dropped as the probability of a
naming collision with the lexical forms plus a "-{n,v,adj,adv}-<n>"
suffix seems extremely low; ;e.g. wnword:bank is an instance of a
wn20:Word and wnword:bank-n-1 is an instance of a wn20:WordSense:

  wnword:bank   rdf:type        wn20:Word               .
  wnword:bank-n-1       rdf:type        wn20:WordSense  .
  wnword:bank-n-1       wn20:word       wnword:bank             .
  wnsyn:bank-n-1        rdf:type        wn20:Synset             .
  wnsyn:bank-n-1        wn20:synsetContainsWordSense wnword:bank-n-1 .

where I've deliberately chosen to use 'wn20' as my ns-prefix
for the schema and 'wnword', 'wnsyn' as my ns-prefix of the
(2.0 version of) the data.
)

>Ralph is working on setting up a server @ W3C to return CBDs on HTTP GETs for the WN URIs, so that the Princeton based URIs in [1] needn't 404. The proposal is to remove the references to Princeton in [1] for the time being, with notice that the aim is to go from W3C based URIs to Princeton based in the future. In that way the document is more usable for current purposes (namely providing a working online WN version and a readable draft that describes it and allows direct examination of the sources).

I have more to say on the topic of distinguishing between
documents that describe WordNet resources and the WordNet
resources themselves.  I will put those comments in a separate
message.  In any case, it is important that we work through the
details sufficiently to persuade ourselves that we have names
that work in practice and that have semantics that we will be
able to explain.

>As an aside, it turned out that the Recipes in [2] do not cover exactly the WN case, namely serving a large set of (small) files (which is a straightforward way to implement CBDs). We actually need a variant of Recipe 2 or 5 where the whole vocabulary is not in one RDF file.

More precisely, [2] does not explicitly cover the WordNet case as
WordNet has historically been an example of a namespace that
clearly did not want to be served as a single resource defining
all the terms present in the namespace.  WordNet is the
canonical huge "slash namespace" and Recipe 5 shows the
basic pattern for serving both human-readable and machine-
interpretable information resources at the URIs of the WordNet
(non-information) resources.

>[1]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion
>[2]http://www.w3.org/TR/swbp-vocab-pub/
>[3]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20060202
>[4]http://lists.w3.org/Archives/Public/public-swbp-wg/2006Feb/0087

Received on Wednesday, 19 April 2006 16:11:01 UTC