Re: metadata (and not only): a few discussion points

Hi,

Happy new year to all

On Wed, Dec 17, 2014 at 6:42 PM, Armando Stellato <stellato@info.uniroma2.it
> wrote:

> Dear all,
>
>
>
> we are expanding a bit here (wrt the meeting minutes) on what has been
> discussed in the last calls for what concerns the final touches still open
> on the metadata module.
>
> We share the feel of being very close to the finalization of the
> vocabulary, however, we would like to deepen the discussion on a few
> aspects which we consider fundamental for the consistency of the whole
> model (and thus *not limited* to the metadata) by weighting pros and
> contra for each of them.
>
> In some cases we may suggest our position, in some others we take no
> stance at all, so just take this as a neutral checklist to be verified all
> together.
>
>
>
> *1.       **Model and Terminology Consistency*
>
> We report here a few statements/idiosyncrasies that have been made/noticed
> in the context of our calls. We would suggest to verify them all together
> and then report them explicitly somewhere, to make things clear from the
> start.
>
>
>
> *a.      **a Lexicon contains only lexical information (no conceptual
> information, such as synsets)*
>
> are we fine with this? In some cases, WordNet (as a whole), which is
> mostly known as a “lexical database” (correct though maybe too general),
> has been called also a Lexicon (computational Lexicon). We know the
> literature can often explode even with terminology misuses, so it’s ok if
> we decide to keep the above statement in a strict way. Just checking
> confirmation (this influences other choices). Also it’s important to take
> in consideration all the modules and where their information belong to
> (semantic / lexicon part).
>
This follows from the separation of the semantic and lexical layers that we
take as the basis of the group... that is we have the ontology describing
the semantics and the lexicon describing the expression of the idea in the
words of some language. Of course, the *Lexicon* is not actually without
semantics due to the *LexicalSense *object, although there is no definition
that says that the LexicalSense belongs to the ontology and lexicon*. *Instead
the lexicon is an *organization or the ontology-lexicon by entries*.

>
>
> *b.      *
> *Lexical/Lexicalized (and then Conceptual/Conceptualized), not only
> terminology…*During the first year, I (Armando) suggested to introduce a
> superclass for synset-like things, and suggested to use the name
> LexicalConcept (used by Miller himself in describing synsets) to represent
> a common semantic entity for synonymic lexical entries. It is important
> that we recall a cause/effect distinction. A LexicalConcept is not a domain
> concept which is being lexicalized (it would be a “lexicalizED concept”),
> but an entity which exists as a semantic complementary element in the
> description of a lexicon (whether it is technically part of it or not, see
> point (a) above). So it is lexical in that is “has to do” with lexical
> descriptions. A few consequences:
>
>
>
> *                                                               i.      *
> *ConceptualLexicon*
>
> This was the name reported in the minutes to represent Lexicons which have
> a conceptual backbone (like synsets in wordnet): actually we suggested:
> *ConceptualizedLexicon*. This sounds not as an oxymoron (agree with John
> that ConceptualLexicon does..), and actually tell more about something
> which is still (purely) a Lexicon. To confirm after vote on (a) if the
> conceptual backbone is part of the Lexicon or not (and so technically to
> which dataset the “evokes” triples belong).
>
This could actually be worth including, but I believe when this was most
recently discussed it was noted that the *skos:ConceptScheme* is
functionally the same as a *ConceptLexicon* and I would rather not
duplicate this mechanism, but we should include an example in the spec
using *ConceptScheme*.

>
>
>                                                              ii.
> *Use of properties evokes/denotes*We have got the impression during last
> calls, that ontolex:evokes has been intended to be used whenever a
> skos:Concept is being described.
>
Yes, evokes is used for conceptual interpretations of words rather than
formal interpretations

> Actually it is important that domain skos:Concepts in KOSs which are
> lexicalized through an ontolex:Lexicon fall in the same category as
> owl:Classes or properties...so to be linked through the ontolex:denotes
> property.
>
The compromise that was reached (I don't like this BTW) is that *denotes*
can also refer to *LexicalConcepts*

> The ontolex:LexicalConcept should be meant to represent the conceptual
> backbone of a lexical database such as WordNet (proposed name:
> ConceptualizedLexicon) which is something totally different (and opposite)
> from lexicalizing a domain concept scheme.
> Note: ontolex:evokes triples should not be part of a LexicalizationSet.
> Synsets are not concepts being lexicalized, but they exist to give meaning
> to lexical entries.
> What to do then? Introduce another class for LexicalConcepts and for
> binding between them and LexicalEntries?, such as lime:Conceptualization?
>
This seems like a reasonable idea, I have wondered about the idea of
introducing a class *Conceptualization* with the same signature as
*Lexicalization*... I don't have a good reason not to.

>
>
>
>
> 2.
> *Requirement: we should be able to model WordNet (and resources alike)*We
> felt important that ontolex should be able to represent WordNet-like
> resources (or wordnets) giving an umbrella over everything that is inside
> them. We should know explicitly (for already cited reasons) from the
> vocabulary whether a Lexicon has a conceptual backbone or not, and many
> other things. The general idea is that an agent using a lexical resource
> should know, by available metadata/data classification, which features it
> can rely on.
>
> Proposal: Introduce property: ontolex:scheme : Lexicon =>
> skos:ConceptScheme
>
OK, if we add a *Conceptualization* we will need to add this property
anyway.

> this originated from requirement (1.a). If accepted, a WordNet would be
> identifiable as a dataset containing a Lexicon AND a (separated)
> Conceptualization.
>
>
> One note: the name ontolex:scheme is quite misleading as it is taken from
> skos:ConceptScheme. Actually, the focus should  be more on the fact that
> the pointed element is a conceptualization of the Lexicon.
> ontolex:conceptualization would be more appropriate.
>
I prefer scheme as the name is more unique in the model

> we would also propose to introduce ontolex:LexicalConceptScheme. John
> suggested that it could be more complicate. Our point is this: if we give
> credit to the existence of LexicalConcepts, than much better to represent a
> proper collector for them: LexicalConceptScheme.
>
> Introduce Class ConceptualizedLexicon ⊑ some.conceptualization
>
> Introduce ConceptualizedLexicon ⊑ Lexicon ⊓ some.conceptualization
>
I don't think this needs to be a core class. Note that sometime early this
year I would like to move into defining non-core terms (such as PoS), in a
vocabulary (Lexinfo3) and this would fit well here. Note that we should
also include a pair to this term call OntologizedLexicon (or similar).

>
>
> ..and yes…maybe we should say something about (the presence/quantity of)
> glosses…
>
Again a possible non-core term.

>
>
>
>
> *3.       *
> *Keeping separate ontolex:Lexicon from lime:Lexicon*In principle no
> issue, we can collapse them. However, a few things that will happen as a
> consequence of the merge:
>
> *Pro*:
> - merge allows for one single entry in the vocabulary (only
> ontolex:Lexicon, no lime:Lexicon)
> - more agile to embed metadata in the data (but only in those cases when
> the lexicon is very small so to be a single file, and not a
> SPARQL-accessible dataset, which would require a separate void file, see
> second point of contra below)
>
> *Contra*:
> - if you have separate data and associated metadata file, you have in any
> case (LOD principle for proper http-dereferenciation) to use different URIs
> to refer to the same object (one in the data and one in the metadata file),
> and then state a owl:sameAs between the two. Quite confounding wrt the
> usual void pattern...
> - common use: the separate void file is necessary in all cases where there
> is a separate lexicalization from the lexicon UNION any scenario where the
> resource is/are of non-trivial size: these are the very common case, and
> would require the separate URIs and the owl:sameAs said above
>
> One point also discussed is if there should be any other element which is
> not properly a Lexicon (in the ontolex sense at least) but still represent
> a purely lexical resource addressable from the metadata point of view.
> SKOS-XL label as separate lexicon is very rare, though not unheard-of as
> reported in the minutes of 5-Dec. Actually we know of cases of thesauri
> with file dumps of skosxl:Labels. These would be “sort-of-Lexicon +
> lexicalization” for a given dataset.
>
Still I don't think we should bend our model around corner cases of SKOS...
the decision to keep these classes the same should stay (IMHO).

>
> *3.1.* *General Discussion on the usefulness of ontolex:Lexicon in the
> data*
>
> In general, in any ontology vocabulary, there is no tradition in
> declaring, in the data, collectors for "typical" objects that are defined
> in the vocabulary...they are just there, put in the data. E.g., there is no
> ClassSet, no PropertySet etc…
>
> A notable exception is skos:ConceptScheme, which is not intended merely as
> a collector. It has been invented to provide different views over the same
> content (like a specific scheme rooted over a non-root concept of the main
> scheme, or even more fine-grained filters applied on the whole content).
> Taken as-is, a ontolex:Lexicon is not providing any of these useful
> information, unless that was the intention.. A case coming to our mind is
> providing a general Lexicon which may have topic separations, in which one
> LexicalEntry belongs to the general Lexicon and to one or more
> (sub)Lexicons… is this the case?
>
True, but there are a number of uses in having a Lexicon... firstly it
guarantees that the whole lexicon is traversable, which is important for
linked data, secondly it allows for general rules to be attached to the
lexicon... for example the non-core morphology module allows morphological
patterns (declensions, etc.) to be attached to the lexicon to say what
grammar rules are used, finally, it is also a very useful place to add
metadata.

Regards,
John

>
>
>
> Cheers,
>
>
>
> Manuel and Armando
>

Received on Monday, 5 January 2015 10:38:29 UTC