Re: [WNET] new proposal WN URIs and related issues

Hi all,

I propose to use the URI schema below for WordNet. Argumentation follows.

I will finish a version of the Draft with this proposal and send it 
a.s.a.p. but latest on monday morning before the telecon. This is one of 
the last chances to get this work into First Public Working Draft Status 
and I personally feel that this issue is not important enough to be the 
reason *not* to get this Draft out. If this is too narrow a view on my 
part, please object.

- http://wordnet.princeton.edu/wn20/instances/synset-bank-noun-1
- http://wordnet.princeton.edu/wn20/instances/wordsense-bank-noun-1
- http://wordnet.princeton.edu/wn20/instances/word-bank
- http://wordnet.princeton.edu/wn20/schema/participleOf

The main reason is that it enables separate management/versioning of 
schema and instances. I do not think having a separate namespace for 
synsets or words/wordsenses actually helps, see argumentation later in 
this email (detailed answers to Ralph).


> It turns out that custom entities can't be done within an HTML document
> so I would consider it a show-stopper to choose between otherwise
> technically similar options the ones that don't fit as QNames [or CURIEs :) ].

Ok, yet another reason to remove the slashes from the XML names.

> This kind of usage is somewhat beyond what you have proposed
> in the current editors' draft.  It implies that there is some sort of
> linguistic behavior of the WordNet data such that, e.g., one might
> be able to say that VerbSynset is a subClass of rdf:Property.

I do not want to imply any interpretation. I am just saying that someone 
wishing to use this interpretation is unable to do so with our former 
proposal (with slashes in the XML names). And I think there is some 
research that wants to do this interpretation or will want.

> Either way, I agree that making this possibility hard to implement
> by choosing names with syntactic restrictions should be avoided
> if it is not otherwise inconvenient.

Ok :-)

> I think it is a good idea to give separate namespaces to the
> terms used to model the data and the data itself.  When we

Ok, so at least two namespaces then.

> get to writing down more best practices for vocabulary
> management, I anticipate that we would find it advantageous
> to be able to separately version the modelling terminology
> and the instance data.

Although I am not sure that one would want to provide a new version of 
the schema without a revised set of instances, we cannot rule out the 
possibility. And you seem to be alluding to other possible management 
issues for which it would be useful to separate schema and instances.

> 
>>... Another option is to create property names that definately do not conflict with words, e.g. by introducing a prefix. Then we can put everything in one namespace. E.g. with URIs
>>
>>- http://wordnet.princeton.edu/wn20/synset-bank-noun-1
>>- http://wordnet.princeton.edu/wn20/wordsense-bank-noun-1
>>- http://wordnet.princeton.edu/wn20/word-bank
>>- http://wordnet.princeton.edu/wn20/schema-participleOf
> 
> 
> If we collapse everything down to one namespace then all of this
> prefix information has to be repeated in each use of every term in
> WordNet.  This doesn't feel convenient to me -- and will upset
> those who care about the size of XML documents they have to
> generate (I'm not particularly one of the latter, but they do tend
> to be vocal.)

I am not sure that I understand this argument. Would it be crucial to 
have four instead of two namespaces? I am currently going for just two 
because I don't see the benefit of four except more flexibility at cost 
of more management. Splitting the schema and the instances is a more 
often used decision, and in all those situations the instances of ALL 
the classes are mixed in one namespace.

> Making the synset namespace be separate gives us a nice
> easy way to refer to "WordNet Basic" -- it's just the namespace
> name of the synset portion.  There may be more triples in
> this synset part of the data than the current draft defines for Basic
> but I expect not enough more to really upset users.  Avoiding
> a requirement to have separate names for "basic" and "full"
> feels good; the application simply chooses to fetch only the parts
> of the vocabulary that it needs.

Actually this does not work. The synset part would be the FULL part of 
the data, which is different from the Synset BASIC version. The 
difference is that the FULL does not contain the senseLabels (the set of 
labels attached to all the Synset's Words) but only a single rdfs:label.

Of course a simple solution would be to add the senseLabels in the 
online version. Then developers interested in the Synsets and labels 
only get what they want, and the Full users get a little duplicate data.

Another solution is serving BASIC and FULL in two separate namespaces. 
But then the connection between FULL and BASIC is lost, and even more 
management needs to be done.

But I think that any developer who is keen on saving triples can just 
download the BASIC version, even cutting off the pieces not needed as 
s/he sees fit. If a developer wants to use the online version because 
s/he only needs a few Synsets then querying a bit more to get to the 
Word-labels connected to the Synsets does not hurt. So I think we should 
just serve FULL online.


> message.  In any case, it is important that we work through the
> details sufficiently to persuade ourselves that we have names
> that work in practice and that have semantics that we will be
> able to explain.

I hope the above and the earlier discussions on the list are enough. To 
really be able to make the right decision I think we need a discussion 
in a wider audience and some practical experience with the data to tell 
us if this is ok. For now I'd like to fix this Draft as a First WD 
before it evaporates because the SWBP WG's time's up.

Cheers,
Mark.

>>[1]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion
>>[2]http://www.w3.org/TR/swbp-vocab-pub/
>>[3]http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20060202
>>[4]http://lists.w3.org/Archives/Public/public-swbp-wg/2006Feb/0087
> 
> 
> 

Received on Thursday, 20 April 2006 20:26:24 UTC