R: Ontolex/Lime: minutes of last meetings and some updates

Dear John,

So, summing up, in this lot of chaos, at least for historical reasons, I would say the name ontolex is sacred :-) but when it comes to a specific property, I would rather be stick with using a somewhat approved and shared terminology. I’m not dogmatic into this neither…just more convinced.

Just for the elegance of it, I would like to see some kind of symmetry between the properties currently called "lexicon" and "targetDataset"... perhaps something like "lexicalizedDataset"/"ontologicalDataset" or "semanticDataset". I think targetDataset sounds a bit too bland.

 

[Armando Stellato] 

Agree, that was the original intention, to have lexicalizedDataset (original name) and lexicon. It is not totally symmetric but, on the other side, renders the direction: you use a lexicon to lexicalize something. However, it seemed too heavy to Philipp (if I remember well). targetDataset was bland, though maybe that was we were looking for.

In principle, I like your suggestions (and same came to me), but I try to go by exclusion:

-          ontologicalDataset: that would seem even more an ontology, because a dataset can be an ontology vocabulary or data, and writing ontologicalDataset would seem to disambiguate it by telling: “ehy guys, this is a vocabulary, no data!”.

-          semanticDataset: thought about it too, seemed a good compromise between being sufficiently general without being bland. However…if the Lexicon is WordNet, would you say it is not a dataset without semantics? Is thus semanticDataset a real discriminant?

-          lexicalizedDataset: our original proposal, ok to come back to it if it pleases the others (again, don’t recall Philipp’s preference on this)

-          I’m open to others, and will think about other possibilities

 

So, for the moment, having no better name, thought that expressing the target of a lexicalization as targetDataset was…exactly, though implicitly, what we want to describe.

 

 

Also, we should attempt to avoid names that are the same (up to capitalization) between models, and as there is already an ontolex:Lexicon class, we should avoid a lime:lexicon property (to avoid confusion).

 

[Armando Stellato] 

mmm…now you raise up another aspect: I was contrary (I said that from the very start) about this late introduction of Lexicon in the core. To me, it should be something in the metadata only ( as we initially depicted), intended as the same kind of proxy available in void for datasets etc.. (a subclass of Dataset actually). There are no big predecessors in other vocabularies about this, except for the core modeling vocabulary OWL (and partially).  Owl ontologies are declared as owl:ontology, and yes in SKOS you have concept schemes (but mostly because users wanted to have multiple schemes, and, btw, schemes are the most flawy and controversial part of SKOS), but for the rest, usually vocabularies do not allow datasets to utter themselves as containers of some specific kind for their content (e.g. Lexicon), they just…allow to model it.

This is maybe because as you yourself said, things may easily mixup in data, thus metadata can logically refer to partitions of available data, but it’s not of a big purpose to declare it in the data.

 

 

 VoID here and call the object a "partition"? 

 

[Armando Stellato] 

I wouldn’t, because in VoID you may address a partition as a whole new dataset description, just addressing the fact that its content is a partition of another one (in general, the main dataset being described in the file). In lime, the focus is not on the partition itself (we don’t add any more descriptions about it), but on how that certain partition has been lexicalized.

But it is still a partition of the lexicalization that we are describing, right? It is the part that only refers to classes/properties/etc. 

 

 

[Armando Stellato] 

Yes, it is, but again, the focus is not on generating and describing a partition. Also, your last comment (“It is the part that only refers to classes/properties/etc.”) makes me guess where the issue is: it is not *only* refers to them. I in fact expect that the most used partition would be the whole set of entities (thus: class = rdfs:Resource), as we did not foresee a different way to represent “the whole”. Just, “the whole”, is a partition like all the others, except it includes everything.

Other solution would be to address it as a dataset (and a partition is a dataset itself), in that case one would declare partitions of the lexicalizedDataset, and then point to them through this property. The only odd thing here is that we would have the property (in this coverage construct) to point back to the dataset (in the whole case), whereas the dataset has been already mentioned in the main LexicalizationSet construct.

 

Here’s an example (I use here the updated LexicalizationSet name, after Philipp’s suggestion):

 

myItLex:myItalianLexicalizationOfDat

  a lime:LexicalizationSet;

  dc/lime:lang "it";  

  lime:lexicalizedDataset :dat ;

  lime:lexicalModel ontolex: ;  
  lime:lexicon :italianWordnet;
  lime:resourceCoverage [   
    lime:class owl:Class;
1]  lime:partition :dat  // what should we do here to address the whole? Repeat again the :dat dataset in case of the whole? 
2]  lime:dataset :dat    // or at this point, even reuse a dataset property?
    lime:percentage …;
    lime:avgNumOfEntries …
  ].

Also, in case of a real partition (I mean, a strictly contained subset of the dataset), if you want to be really aligned with void, then you should create a partition entity, which would complexify the thing). I think it is not necessary as here it’s implicit that the dataset is the one addressed with lime:lexicalizedDataset, outside of the resourceCoverage structure.

 

Cheers,

 

Armando

Received on Tuesday, 15 July 2014 10:53:26 UTC