- From: Ralph R. Swick <swick@w3.org>
- Date: Wed, 19 Apr 2006 12:10:46 -0400
- To: Mark van Assem <mark@cs.vu.nl>
- Cc: SWBPD list <public-swbp-wg@w3.org>
At 07:05 PM 4/18/2006 +0300, Mark van Assem wrote: ... >The reasons for the changes to the current proposal are: > >1) the trailing slash causes problems in using properties, e.g. <wn:synsetId/>value</wn:synsetId/> results in a parsing error. The only properties being defined are in the schema, which I think deserves its own namespace separate from the WordNet instance data. I do think most of the trailing '/'s were a confusing choice and I was about to write mail to the list proposing which ones should be dropped, so thank you for getting there before me :) >2) because of the use of slashes in the 'local part' of the URIs (e.g. bank/noun/1), it becomes impossible to use the ns:localId notation (QNames). Slashes are not allowed within localId. Instead then only entities could be used to define instances, e.g. ... >This is not really inhibiting (only abit awkard maybe) but it does inhibit in the next point. It turns out that custom entities can't be done within an HTML document so I would consider it a show-stopper to choose between otherwise technically similar options the ones that don't fit as QNames [or CURIEs :) ]. I'd like authors of HTML documents to be able to include RDF that uses our WordNet resources without unnecessary additional aggravation. >3) it is impossible to recast WN Synsets as properties, e.g. to use WN VerbSynsets as properties: > > <rdf:Description rdf:about="&wn20synset;vase/noun/1"> > <&wn20synset;above/verb/1 rdf:resource="&wn20synset;table/noun/1" /> > </rdf:Description> This kind of usage is somewhat beyond what you have proposed in the current editors' draft. It implies that there is some sort of linguistic behavior of the WordNet data such that, e.g., one might be able to say that VerbSynset is a subClass of rdf:Property. Do you believe that WordNet has this characteristic? Do you believe that Princeton would agree that this characteristic holds? I suspect this linguistic characteristic would only apply to VerbSynset, not to any of the other classes. Either way, I agree that making this possibility hard to implement by choosing names with syntactic restrictions should be avoided if it is not otherwise inconvenient. >is impossible. For attributes, if I understand correctly, only the ns:localId notation is allowed in RDF/XML (so writing out the complete URI would not solve this). XML attribute _names_ may not include '/', correct. ... >Note that the URIs for instances of Synsets, WordSenses and Words, as well as the URIs of classes and properties are in both proposals effectively in different namespaces (although there is a relationship between them). I am not sure this is a good idea after all, I think it is a good idea to give separate namespaces to the terms used to model the data and the data itself. When we get to writing down more best practices for vocabulary management, I anticipate that we would find it advantageous to be able to separately version the modelling terminology and the instance data. >... Another option is to create property names that definately do not conflict with words, e.g. by introducing a prefix. Then we can put everything in one namespace. E.g. with URIs > >- http://wordnet.princeton.edu/wn20/synset-bank-noun-1 >- http://wordnet.princeton.edu/wn20/wordsense-bank-noun-1 >- http://wordnet.princeton.edu/wn20/word-bank >- http://wordnet.princeton.edu/wn20/schema-participleOf If we collapse everything down to one namespace then all of this prefix information has to be repeated in each use of every term in WordNet. This doesn't feel convenient to me -- and will upset those who care about the size of XML documents they have to generate (I'm not particularly one of the latter, but they do tend to be vocal.) I lean toward four namespaces: ...wn20/schema/ ...wn20/synset/ ...wn20/wordsense/ ...wn20/word/ Making the synset namespace be separate gives us a nice easy way to refer to "WordNet Basic" -- it's just the namespace name of the synset portion. There may be more triples in this synset part of the data than the current draft defines for Basic but I expect not enough more to really upset users. Avoiding a requirement to have separate names for "basic" and "full" feels good; the application simply chooses to fetch only the parts of the vocabulary that it needs. wordsense/ and word/ could be collapsed to just one namespace, particularly if those sets of data are always expected to be used together: ...wn20/word/ with word instances like wnword:bank wnword:read_write_memory and word senses like wnword:sense-bank-noun-1 or wnword:sense-bank-n-1 or perhaps wnword:bank-sense-n-1 (I'm somewhat taken by the %lexform% "-sense-" %type% "-" %num% form and note that "-sense" can be dropped as the probability of a naming collision with the lexical forms plus a "-{n,v,adj,adv}-<n>" suffix seems extremely low; ;e.g. wnword:bank is an instance of a wn20:Word and wnword:bank-n-1 is an instance of a wn20:WordSense: wnword:bank rdf:type wn20:Word . wnword:bank-n-1 rdf:type wn20:WordSense . wnword:bank-n-1 wn20:word wnword:bank . wnsyn:bank-n-1 rdf:type wn20:Synset . wnsyn:bank-n-1 wn20:synsetContainsWordSense wnword:bank-n-1 . where I've deliberately chosen to use 'wn20' as my ns-prefix for the schema and 'wnword', 'wnsyn' as my ns-prefix of the (2.0 version of) the data. ) >Ralph is working on setting up a server @ W3C to return CBDs on HTTP GETs for the WN URIs, so that the Princeton based URIs in [1] needn't 404. The proposal is to remove the references to Princeton in [1] for the time being, with notice that the aim is to go from W3C based URIs to Princeton based in the future. In that way the document is more usable for current purposes (namely providing a working online WN version and a readable draft that describes it and allows direct examination of the sources). I have more to say on the topic of distinguishing between documents that describe WordNet resources and the WordNet resources themselves. I will put those comments in a separate message. In any case, it is important that we work through the details sufficiently to persuade ourselves that we have names that work in practice and that have semantics that we will be able to explain. >As an aside, it turned out that the Recipes in [2] do not cover exactly the WN case, namely serving a large set of (small) files (which is a straightforward way to implement CBDs). We actually need a variant of Recipe 2 or 5 where the whole vocabulary is not in one RDF file. More precisely, [2] does not explicitly cover the WordNet case as WordNet has historically been an example of a namespace that clearly did not want to be served as a single resource defining all the terms present in the namespace. WordNet is the canonical huge "slash namespace" and Recipe 5 shows the basic pattern for serving both human-readable and machine- interpretable information resources at the URIs of the WordNet (non-information) resources. >[1]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion >[2]http://www.w3.org/TR/swbp-vocab-pub/ >[3]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20060202 >[4]http://lists.w3.org/Archives/Public/public-swbp-wg/2006Feb/0087
Received on Wednesday, 19 April 2006 16:11:01 UTC