Re: Consolidating LOD vocabularies for linguistic annotations from Sebastian Hellmann on 2020-11-06 (public-ld4lt@w3.org from November 2020)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Fri, 6 Nov 2020 11:59:19 +0100
To: public-ld4lt@w3.org, Denis Streitmatter <streitmatter@informatik.uni-leipzig.de>
Message-ID: <ba54a895-e58b-f2e2-3133-03fdcb3de585@informatik.uni-leipzig.de>
Hi Christian, all,

Denis and I are working intensively on building the user interface 
around DBpedia Archivo: http://archivo.dbpedia.org/

Note that it is still slow and has some bugs, but it will get better 
from week to week.  The best source to understand the engine is the 
paper: https://svn.aksw.org/papers/2020/semantics_archivo/public.pdf

It consolidates vocabularies in this manner:

- there is a star rating for the basics, i.e. parsing, license, etc.   
lemon only has 1 star: 
http://archivo.dbpedia.org/info?o=http://lemon-model.net/lemon (missing 
license)

- persistence: every 8 h  any new versions are backuped. We will move 
this backup to the dbpedia download server, which is around for 13 years 
now and is backuped by Uni Mannheim. So if ontologies disappear, they 
will still be reachable there.

- you can download all ontologies at once

- you can add custom SHACL tests https://www.w3.org/TR/shacl/ like we 
did for LODE: 
https://github.com/dbpedia/Archivo/tree/master/shacl-library which 
allows you to add custom checks, e.g. for a subcommunity such as lexvo 
modules. (Send a pull request to the git)

All the best,

Sebastian



On 02.11.20 14:44, Christian Chiarcos wrote:
> Dear all,
>
> as there have been requests for additional time slots, I added five 
> possible slots to the Doodle 
> (https://doodle.com/poll/2bvb78z42tpsa5fm). Please update your 
> preferences.
>
> Thanks a lot, stay healthy and speak to you soon,
> Christian
>
> Am .10.2020, 10:51 Uhr, schrieb Christian Chiarcos 
> <christian.chiarcos@gmail.com>:
>
>     Dear all,
>
>     after an extended summer break, it is time to take up LD4LT
>     annotation telcos, again. I created a Doodle
>     under https://doodle.com/poll/2bvb78z42tpsa5fm. The new Doodle is
>     necessary because the original time slot, Thu 10-11 CE(S)T, has a
>     risk of clashing with Nexus Linguarum telcos (see last point below).
>
>     Major developments in and after the July telco:
>
>     - After we spent much of the last two telcos on discussing the
>     relation between W3C, resp., their specifications, and ISO, resp.,
>     their drafts, it became clear that any public discussion of drafts
>     or other internal documentation of ISO specifications is
>     discouraged by ISO and its national partner organizations.
>     Moreover, it does not seem to be possible to enter a formal
>     relationship between W3C CGs and ISO (for legal reasons, not for
>     scientific ones) to arrange an official exchange of ideas. In
>     other words, the extent to which any public discussion on the
>     development of community conventions for linguistic annotations on
>     the web can include information from/about ISO standards is
>     limited to publicly available information (basically, scientific
>     publications) that describe the respective standards or their
>     underlying concepts. Regardless of whether they are fully
>     identical to the eventual ISO standard, this is necessary to
>     benefit from the discussions and expertise that has been going
>     into these specifications, as we clearly do not want to re-invent
>     the wheel, but to contribute to a broadly applicable and inclusive
>     Linked-Data-based ecosystem for language technology and language
>     sciences on the web. One current problem of the ISO standards is
>     that they do not organically translate into Linked-Data-compliant
>     specifications, and this seems not to be very likely to improve.
>     An alternative would be to move the entire discussion to ISO, but
>     I would strongly prefer an open and transparent discussion process
>     without any formal entry barriers to interested contributors. A
>     W3C CG provides that, ISO doesn't.
>
>     - As for ISO-related papers, these may or may not reflect the
>     current state of the standard or its published form. It is still
>     safe to collect open access (!) versions of relevant scientific
>     papers published on the topics under
>     https://github.com/ld4lt/linguistic-annotation/tree/master/doc/iso.
>     Before, I had created a private repository with the intent to
>     collect proprietary publications and share them in accordance with
>     the exceptions to (German) copyright law for the sake of
>     scientific research/education, but it seems that sharing full
>     publications is no longer compliant with the latest revision of
>     German copyright law. If we want to have such a repository,
>     somebody from a country with a more liberal copyright policy
>     should create and maintain that repository. A candidate would be
>     the US, where this would basically be fair use.
>
>     - As for any W3C CG, the mid-term goal of our discussions is to
>     provide a community report, which could be, for example, (1) a
>     survey or (2) a specification that brings together NIF, Web
>     Annotation, *published* ISO standards, etc. In my personal
>     opinion, we should do *both*: a survey on their respective
>     features (and we -- mostly Milan Dojchinovski and myself -- have
>     begun with that, see
>     https://github.com/ld4lt/linguistic-annotation/blob/master/survey/required-features.md),
>     and then work towards a vocabulary. This vocabulary could then be
>     input for subsequent formal standardization, either through W3C,
>     ISO or both. So, there is a possible relation to ISO, and to have
>     some ties with ISO remains relevant, but unless there is a way to
>     share ISO-internal information in public (and as far as I can see,
>     there isn't, at least not on a community-level [at the level of
>     individual cooperation, that's different]), this will have to be
>     largely unidirectional, with ISO taking potential input from us.
>     The only way I can see direct input from ISO is if people involved
>     in ISO standardization point us to their most relevant
>     publications on the topics.
>
>     - (As many of you know) The COST Action "Nexus Linguarum. European
>     network for Web-centred linguistic data science" (CA
>     18209, https://nexuslinguarum.eu/) is a European network of
>     experts on topics of linguistic linked data and related topics.
>     Since its establishment in October 2019, it has largely focused on
>     internal consolidation and the formulation of specific tasks and
>     use cases. While that process is still going on, much progress has
>     been demonstrated in the plenary meeting that was held in the last
>     two days. One of the tasks centers on modelling linguistic data,
>     with a sub-topic on linguistic annotations, which has formally
>     taken up work in September 2020, and as many LD4LT members are
>     also active in Nexus, I would suggest to collaborate with this
>     Nexus task on the creation of the survey of features of existing
>     (community) standards of linguistic annotation.
>
>     Best regards,
>     Christian
>
>
>
>
Received on Friday, 6 November 2020 10:59:38 UTC