W3C home > Mailing lists > Public > public-ld4lt@w3.org > November 2020

Re: Consolidating LOD vocabularies for linguistic annotations

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Mon, 9 Nov 2020 11:18:40 +0100
Message-ID: <CAC1YGdhOqA7Efnu12UTuiyxvn8BHEoJ0YBcARodnp_eo4e_Ybw@mail.gmail.com>
To: Linked Data for Language Technology Community Group <public-ld4lt@w3.org>, Max Ionov <max.ionov@gmail.com>
Dear all,

we have a majority for Thu, Nov 26, 2020, 11:00 CET. I am still in the
process of updating the agenda document (as usual under
old minutes under
including those from the last telco). Also note that we will have a
different telco link, tba. around Nov 20.

Important points to be discussed:
- Introduce and discuss relation with Nexus Linguarum
- Approve tentative consensus on relation to ISO (i.e., independent, but we
take publicly available information of/by/about ISO as a source of
inspiration and acknowledge it as such)
- Verify if NIF 2.0 or WebAnnotation are sufficient to address the
requirements of linguistic annotations on the web (check, contribute to and
discuss under
so far, mostly compiled by Milan and me), and if not, how and where best we
want to suggest extensions to.

Thanks for your participation, and stay safe,

PS: Apologies to those whose time preferences the majority vote doesn't
meet. Suggestion: For narrowing down a regular 6-weekly slot, please send
me your time and day preferences (in CET ;) so that we can include these in
the discussion of future meeting slots. If there is a significant number of
participants from time zones outside Europe (or Africa), we can consider
having telcos at regularly alternating times.

Am Mi., 28. Okt. 2020 um 10:51 Uhr schrieb Christian Chiarcos <

> Dear all,
> after an extended summer break, it is time to take up LD4LT annotation
> telcos, again. I created a Doodle under
> https://doodle.com/poll/2bvb78z42tpsa5fm. The new Doodle is necessary
> because the original time slot, Thu 10-11 CE(S)T, has a risk of clashing
> with Nexus Linguarum telcos (see last point below).
> Major developments in and after the July telco:
> - After we spent much of the last two telcos on discussing the relation
> between W3C, resp., their specifications, and ISO, resp., their drafts, it
> became clear that any public discussion of drafts or other internal
> documentation of ISO specifications is discouraged by ISO and its national
> partner organizations. Moreover, it does not seem to be possible to enter a
> formal relationship between W3C CGs and ISO (for legal reasons, not for
> scientific ones) to arrange an official exchange of ideas. In other words,
> the extent to which any public discussion on the development of community
> conventions for linguistic annotations on the web can include information
> from/about ISO standards is limited to publicly available information
> (basically, scientific publications) that describe the respective standards
> or their underlying concepts. Regardless of whether they are fully
> identical to the eventual ISO standard, this is necessary to benefit from
> the discussions and expertise that has been going into these
> specifications, as we clearly do not want to re-invent the wheel, but to
> contribute to a broadly applicable and inclusive Linked-Data-based
> ecosystem for language technology and language sciences on the web. One
> current problem of the ISO standards is that they do not organically
> translate into Linked-Data-compliant specifications, and this seems not to
> be very likely to improve. An alternative would be to move the entire
> discussion to ISO, but I would strongly prefer an open and transparent
> discussion process without any formal entry barriers to interested
> contributors. A W3C CG provides that, ISO doesn't.
> - As for ISO-related papers, these may or may not reflect the current
> state of the standard or its published form. It is still safe to collect
> open access (!) versions of relevant scientific papers published on the
> topics under
> https://github.com/ld4lt/linguistic-annotation/tree/master/doc/iso.
> Before, I had created a private repository with the intent to collect
> proprietary publications and share them in accordance with the exceptions
> to (German) copyright law for the sake of scientific research/education,
> but it seems that sharing full publications is no longer compliant with the
> latest revision of German copyright law. If we want to have such a
> repository, somebody from a country with a more liberal copyright policy
> should create and maintain that repository. A candidate would be the US,
> where this would basically be fair use.
> - As for any W3C CG, the mid-term goal of our discussions is to provide a
> community report, which could be, for example, (1) a survey or (2) a
> specification that brings together NIF, Web Annotation, *published* ISO
> standards, etc. In my personal opinion, we should do *both*: a survey on
> their respective features (and we -- mostly Milan Dojchinovski and myself
> -- have begun with that, see
> https://github.com/ld4lt/linguistic-annotation/blob/master/survey/required-features.md),
> and then work towards a vocabulary. This vocabulary could then be input for
> subsequent formal standardization, either through W3C, ISO or both. So,
> there is a possible relation to ISO, and to have some ties with ISO remains
> relevant, but unless there is a way to share ISO-internal information in
> public (and as far as I can see, there isn't, at least not on a
> community-level [at the level of individual cooperation, that's
> different]), this will have to be largely unidirectional, with ISO taking
> potential input from us. The only way I can see direct input from ISO is if
> people involved in ISO standardization point us to their most relevant
> publications on the topics.
> - (As many of you know) The COST Action "Nexus Linguarum. European network
> for Web-centred linguistic data science" (CA 18209,
> https://nexuslinguarum.eu/) is a European network of experts on topics of
> linguistic linked data and related topics. Since its establishment in
> October 2019, it has largely focused on internal consolidation and the
> formulation of specific tasks and use cases. While that process is still
> going on, much progress has been demonstrated in the plenary meeting that
> was held in the last two days. One of the tasks centers on modelling
> linguistic data, with a sub-topic on linguistic annotations, which has
> formally taken up work in September 2020, and as many LD4LT members are
> also active in Nexus, I would suggest to collaborate with this Nexus task
> on the creation of the survey of features of existing (community) standards
> of linguistic annotation.
> Best regards,
> Christian
Received on Monday, 9 November 2020 10:19:29 UTC

This archive was generated by hypermail 2.4.0 : Monday, 9 November 2020 10:19:30 UTC