Re: Consolidating LOD vocabularies for linguistic annotations from Christian Chiarcos on 2020-11-09 (public-ld4lt@w3.org from November 2020)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Mon, 9 Nov 2020 11:18:40 +0100
To: Linked Data for Language Technology Community Group <public-ld4lt@w3.org>, Max Ionov <max.ionov@gmail.com>
Message-ID: <CAC1YGdhOqA7Efnu12UTuiyxvn8BHEoJ0YBcARodnp_eo4e_Ybw@mail.gmail.com>
Dear all,

we have a majority for Thu, Nov 26, 2020, 11:00 CET. I am still in the
process of updating the agenda document (as usual under
https://docs.google.com/document/d/1OGeE96V79iAMavOR6jM-zIA9kKfrC2Pnp5WDu2ZPV-0/edit?usp=sharing,
old minutes under
https://github.com/ld4lt/linguistic-annotation/tree/master/doc/minutes,
including those from the last telco). Also note that we will have a
different telco link, tba. around Nov 20.

Important points to be discussed:
- Introduce and discuss relation with Nexus Linguarum
- Approve tentative consensus on relation to ISO (i.e., independent, but we
take publicly available information of/by/about ISO as a source of
inspiration and acknowledge it as such)
- Verify if NIF 2.0 or WebAnnotation are sufficient to address the
requirements of linguistic annotations on the web (check, contribute to and
discuss under
https://github.com/ld4lt/linguistic-annotation/blob/master/survey/required-features.md,
so far, mostly compiled by Milan and me), and if not, how and where best we
want to suggest extensions to.

Thanks for your participation, and stay safe,
Christian

PS: Apologies to those whose time preferences the majority vote doesn't
meet. Suggestion: For narrowing down a regular 6-weekly slot, please send
me your time and day preferences (in CET ;) so that we can include these in
the discussion of future meeting slots. If there is a significant number of
participants from time zones outside Europe (or Africa), we can consider
having telcos at regularly alternating times.

Am Mi., 28. Okt. 2020 um 10:51 Uhr schrieb Christian Chiarcos <
christian.chiarcos@web.de>:

> Dear all,
>
> after an extended summer break, it is time to take up LD4LT annotation
> telcos, again. I created a Doodle under
> https://doodle.com/poll/2bvb78z42tpsa5fm. The new Doodle is necessary
> because the original time slot, Thu 10-11 CE(S)T, has a risk of clashing
> with Nexus Linguarum telcos (see last point below).
>
> Major developments in and after the July telco:
>
> - After we spent much of the last two telcos on discussing the relation
> between W3C, resp., their specifications, and ISO, resp., their drafts, it
> became clear that any public discussion of drafts or other internal
> documentation of ISO specifications is discouraged by ISO and its national
> partner organizations. Moreover, it does not seem to be possible to enter a
> formal relationship between W3C CGs and ISO (for legal reasons, not for
> scientific ones) to arrange an official exchange of ideas. In other words,
> the extent to which any public discussion on the development of community
> conventions for linguistic annotations on the web can include information
> from/about ISO standards is limited to publicly available information
> (basically, scientific publications) that describe the respective standards
> or their underlying concepts. Regardless of whether they are fully
> identical to the eventual ISO standard, this is necessary to benefit from
> the discussions and expertise that has been going into these
> specifications, as we clearly do not want to re-invent the wheel, but to
> contribute to a broadly applicable and inclusive Linked-Data-based
> ecosystem for language technology and language sciences on the web. One
> current problem of the ISO standards is that they do not organically
> translate into Linked-Data-compliant specifications, and this seems not to
> be very likely to improve. An alternative would be to move the entire
> discussion to ISO, but I would strongly prefer an open and transparent
> discussion process without any formal entry barriers to interested
> contributors. A W3C CG provides that, ISO doesn't.
>
> - As for ISO-related papers, these may or may not reflect the current
> state of the standard or its published form. It is still safe to collect
> open access (!) versions of relevant scientific papers published on the
> topics under
> https://github.com/ld4lt/linguistic-annotation/tree/master/doc/iso.
> Before, I had created a private repository with the intent to collect
> proprietary publications and share them in accordance with the exceptions
> to (German) copyright law for the sake of scientific research/education,
> but it seems that sharing full publications is no longer compliant with the
> latest revision of German copyright law. If we want to have such a
> repository, somebody from a country with a more liberal copyright policy
> should create and maintain that repository. A candidate would be the US,
> where this would basically be fair use.
>
> - As for any W3C CG, the mid-term goal of our discussions is to provide a
> community report, which could be, for example, (1) a survey or (2) a
> specification that brings together NIF, Web Annotation, *published* ISO
> standards, etc. In my personal opinion, we should do *both*: a survey on
> their respective features (and we -- mostly Milan Dojchinovski and myself
> -- have begun with that, see
> https://github.com/ld4lt/linguistic-annotation/blob/master/survey/required-features.md),
> and then work towards a vocabulary. This vocabulary could then be input for
> subsequent formal standardization, either through W3C, ISO or both. So,
> there is a possible relation to ISO, and to have some ties with ISO remains
> relevant, but unless there is a way to share ISO-internal information in
> public (and as far as I can see, there isn't, at least not on a
> community-level [at the level of individual cooperation, that's
> different]), this will have to be largely unidirectional, with ISO taking
> potential input from us. The only way I can see direct input from ISO is if
> people involved in ISO standardization point us to their most relevant
> publications on the topics.
>
> - (As many of you know) The COST Action "Nexus Linguarum. European network
> for Web-centred linguistic data science" (CA 18209,
> https://nexuslinguarum.eu/) is a European network of experts on topics of
> linguistic linked data and related topics. Since its establishment in
> October 2019, it has largely focused on internal consolidation and the
> formulation of specific tasks and use cases. While that process is still
> going on, much progress has been demonstrated in the plenary meeting that
> was held in the last two days. One of the tasks centers on modelling
> linguistic data, with a sub-topic on linguistic annotations, which has
> formally taken up work in September 2020, and as many LD4LT members are
> also active in Nexus, I would suggest to collaborate with this Nexus task
> on the creation of the survey of features of existing (community) standards
> of linguistic annotation.
>
> Best regards,
> Christian
>
Received on Monday, 9 November 2020 10:19:29 UTC