RE: SKOS profiles: Simple vs Structured from Armando Stellato on 2020-04-01 (public-esw-thes@w3.org from April 2020)

From: Armando Stellato <stellato@uniroma2.it>
Date: Wed, 1 Apr 2020 10:01:12 +0000
To: Vladimir Alexiev <vladimir.alexiev@ontotext.com>, "public-esw-thes@w3.org" <public-esw-thes@w3.org>, Rob Sanderson <rsanderson@getty.edu>
Message-ID: <DB6PR1001MB10137107E72B3538008F99FBC7C90@DB6PR1001MB1013.EURPRD10.PROD.OUTLOOK.>
Dear Vladimir,

I totally share your concern. I was however wondering if it is not the case to see that under a more general perspective of dataset metadata where these specific cases for SKOS would fit.

For instance, in VocBench, we let the users specify the semantic and lexicalization model when starting a project. This tells our system which labels it should produce (and consume for rendering).

VocBench is a tool but you rightly talk about formalizing this. This has been already partially done – for SKOS vs SKOS-XL , though not for skos:notes because they were out-of-scope – in LIME [1, 2], the metadata module of the Ontolex [3] suite of vocabularies for describing Lexicons and onto-lexicon interfaces.
We have also published API (based on the RDF4J framework) for querying LIME data [4].

LIME is the metadata counterpart of Ontolex and as such it should be scoped to Ontolex only; however, at the time of developing the Ontolex model and the specific metadata module, after some discussion, we opted for extending it (making it de facto a general lexical metadata vocabulary, extending VoID) in order to cover all possible cases.
Thus in LIME you can specify various “lexicalizations” for a single dataset; these vary on the covered natural language and the model used for representing it.
Current possible values cover the URIs for the following vocabularies:
* rdfs (for rdfs:labels)
* skos (for skos core terminological labels)
* skos-xl (for skos-xl reified labels)
* ontolex (for the ontolex onto-lexicon interface: lexical entries, senses, references etc..)

Here’s an excerpt of a LIME description, providing the description of a lexicalization set for the Agrovoc thesaurus. You can find the full void/lime file here: [5]

:c96197d0-2de1-470d-bdde-1bc0b001fbd5 a <http://www.w3.org/ns/lemon/lime#LexicalizationSet>;
  <http://www.w3.org/ns/lemon/lime#avgNumOfLexicalizations> 1.1004112;
  <http://www.w3.org/ns/lemon/lime#language> "en"^^xsd:language;
  <http://www.w3.org/ns/lemon/lime#lexicalizationModel> <http://www.w3.org/2008/05/skos-xl>;
  <http://www.w3.org/ns/lemon/lime#lexicalizations> 46302;
  <http://www.w3.org/ns/lemon/lime#percentage> 0.8676712;
  <http://www.w3.org/ns/lemon/lime#referenceDataset> <http://aims.fao.org/aos/agrovoc/void.ttl#Agrovoc>;
  <http://www.w3.org/ns/lemon/lime#references> 36509 .

As you can see, besides the modeling information, there is quantitative information concerning the coverage (the percentage property) of the dataset semantic resources and average number of lexicalizations and absolute numbers such as lexicalizations and references.
Note that the use of different, independent, lexicalizations allows for describing the presence of both SKOS and SKOS-XL labels as well.

We are currently using it in various scenarios, among which:

  *   Supporting automatic optimization of alignment processes
  *   Caching information about datasets, so that they can be remotely access using the best configuration

The second one is backed by further properties and a general profile that we described in here [6]

I believe LIME largely replies, with something quite standard (it’s the result of a W3C community group effort), to your point about profiling SKOS vs SKOS-XL.

SKOS notes are also an issue. In VB3 we address them in a specific way (we have custom forms that “override” both the creation/rendering of their values) but this is not specified in RDF.
One thing that comes to my mind is this:

  *   In ontolex there’s a new module recently developed: https://www.w3.org/2019/09/lexicog/

  *   It would be appropriate to extend LIME metadata descriptors in order to cover this module as well
  *   I would ask the group if we could “stretch” the coverage of metadata for that module to non-ontolex entries (and I guess skos documentation properties and their possible patterns would fall in that category wrt ontolex:lexicog), as we did for lexicalization models

Cheers,

Armando

[1] https://link.springer.com/chapter/10.1007/978-3-319-18818-8_20
[2] https://www.w3.org/2016/05/ontolex/#metadata-lime

[3] https://www.w3.org/2016/05/ontolex/

[4] http://ceur-ws.org/Vol-1899/OntoLex_2017_paper_8.pdf

[5] http://aims.fao.org/aos/agrovoc/void.ttl

[6] https://link.springer.com/chapter/10.1007/978-3-030-36599-8_2





From: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
Sent: Wednesday, April 1, 2020 10:46 AM
To: public-esw-thes@w3.org; Rob Sanderson <rsanderson@getty.edu>
Subject: SKOS profiles: Simple vs Structured

Hi!

Skos is provided in one of two formats (profiles):

  *   Simple (SKOS)
  *   Structured (SKOSXL<https://www.w3.org/TR/skos-reference/#xl>+Advanced Documentation Features<https://www.w3.org/TR/skos-primer/#secadvanceddocumentation> with metadata/provenance props). "Documentation" means notes, definitions, etc
It's a common practice to publish "structured" with redundancy, to cater to both "simple" consumers and "structured" consumers:

  *   SKOSXL recommends structured labels to be published redundantly: as plain SKOS labels and as skosxl:Label. Dumbing-Down to SKOS Lexical Labels<https://www.w3.org/TR/skos-reference/#L780> defines how to provide structured and plain labels together.
  *   Not sure about notes, as neither SKOS nor SKOSXL defines a class Note (Getty defines gvp:Note), nor separate properties. So skos:definition and friends would carry both a string and a resource, which will complicate consumption.

Currently a SKOS dataset or API does not have a way to declare its profile.
https://github.com/NatLibFi/Skosmos/issues/477 describes some troubles related to this:

  *   Skosmos uses "duplicate label matching logic" to display the label below just once, and assumes label redundancy.
<concept> skos:prefLabel "foo"@en; skosxl:prefLabel [skosxl:literalForm "foo"@en]

  *   However, there is no similar logic for notes, so it would display labels in duplicate.

To avoid complicated duplicate matching logic at the consumer, I think we should define two SKOS profiles: simple vs structured.

  *   Should "structured" subsume "simple", i.e. redundantly provide the same strings as simple labels/notes? That will simplify life for data providers
  *   Do we need the two aspects separately: structured labels vs structured notes?
  *   The profile should be communicated:

     *   In HTTP request: client should be able to request "simple" or "structured"
     *   In HTTP response
     *   In the description of ConceptScheme and VOID/DCAT Dataset (property dct:conformsTo)

  *   ConceptSchemes should provide completeness guarantees: if one label or note is structured, then all labels respectively notes are available as structured. I think these SPARQL tests should be used:

     *   Some SKOSXL label exists:

<scheme> ^skos:inScheme/(skosxl:prefLabel|skosxl:altLabel|skosxl:hiddenLabel) ?label

  *   Some skos:definition or skos:scopeNote is non-literal. I exclude: skos:changeNote, skos:historyNote, skos:editorialNote because these may be structured without the "business payload" notes being structured; skos:example because conceivably it can point to a resource; skos:note because that's a super-prop of excluded props (but many people use it directly, so I'm not sure):

<scheme> ^skos:inScheme/(skos:definition|skos:scopeNote) ?definition

Assuming subsumption/redundancy (that "structured" includes "simple") and that the client can use SPARQL then "duplicate matching" can be done easily in SPARQL. Eg something like this:

select ?lab ?prop ?propLabel ?metadata {

  <concept> skos:prefLabel ?lab.

  optional {

    <concept> skosxl:prefLabel ?label.

    ?label skosxl:literalForm ?label; ?prop ?metadata

    filter (?prop != skosxl:literalForm)

    optional {?prop (rdfs:label|skos:prefLabel) ?propLabel} # need lang preferencing here!

  }

}



select ?def ?prop ?propLabel ?metadata {

  <concept> skos:definition ?def.

  optional {

    <concept> skos:definition ?definition.

    ?definition rdf:value ?def; ?prop ?metadata

    filter (?prop != rdf:value)

    optional {?prop (rdfs:label|skos:prefLabel) ?propLabel} # need lang preferencing here!

  }

}

Are there any takers to formalize SKOS profiles?
--
Vladimir Alexiev, PhD, PMP
Chief Data Architect
Sirma AI, trading as Ontotext: https://www.ontotext.com<https://www.ontotext.com/>, LinkedIn<https://www.linkedin.com/company-beta/208070>, Twitter<https://twitter.com/ontotext>, Rate GraphDB<http://www.capterra.com/database-management-software/reviews/157533/Graph%20DB/Ontotext/new>
Email: vladimir.alexiev@ontotext.com<mailto:vladimir.alexiev@ontotext.com>, skype:valexiev1
Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net<mailto:359888568132@sms.mtel.net>
Calendar: https://www.google.com/calendar/embed?src=vladimir.alexiev@ontotext.com

Publications and CV: https://github.com/VladimirAlexiev/my
Received on Wednesday, 1 April 2020 10:01:35 UTC