Re: Consolidating LOD vocabularies for linguistic annotations from Christian Chiarcos on 2020-11-02 (public-ld4lt@w3.org from November 2020)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Mon, 02 Nov 2020 14:44:37 +0100
To: "Linked Data for Language Technology Community Group" <public-ld4lt@w3.org>
Cc: "chiarcos@informatik.uni-frankfurt.de" <chiarcos@informatik.uni-frankfurt.de>
Message-ID: <op.0tgi4nwjbr5td5@kitaba>
Dear all,

as there have been requests for additional time slots, I added five  
possible slots to the Doodle (https://doodle.com/poll/2bvb78z42tpsa5fm).  
Please update your preferences.

Thanks a lot, stay healthy and speak to you soon,
Christian

Am .10.2020, 10:51 Uhr, schrieb Christian Chiarcos  
<christian.chiarcos@gmail.com>:

> Dear all,
>
> after an extended summer break, it is time to take up LD4LT annotation  
> telcos, again. I created a Doodle under  
> https://doodle.com/poll/2bvb78z42tpsa5fm. >The new Doodle is necessary  
> because the original time slot, Thu 10-11 CE(S)T, has a risk of clashing  
> with Nexus Linguarum telcos (see last point below).
>
> Major developments in and after the July telco:
>
> - After we spent much of the last two telcos on discussing the relation  
> between W3C, resp., their specifications, and ISO, resp., their drafts,  
> it became >clear that any public discussion of drafts or other internal  
> documentation of ISO specifications is discouraged by ISO and its  
> national partner >organizations. Moreover, it does not seem to be  
> possible to enter a formal relationship between W3C CGs and ISO (for  
> legal reasons, not for scientific >ones) to arrange an official exchange  
> of ideas. In other words, the extent to which any public discussion on  
> the development of community conventions >for linguistic annotations on  
> the web can include information from/about ISO standards is limited to  
> publicly available information (basically, scientific >publications)  
> that describe the respective standards or their underlying concepts.  
> Regardless of whether they are fully identical to the eventual ISO  
> >standard, this is necessary to benefit from the discussions and  
> expertise that has been going into these specifications, as we clearly  
> do not want to re->invent the wheel, but to contribute to a broadly  
> applicable and inclusive Linked-Data-based ecosystem for language  
> technology and language sciences >on the web. One current problem of the  
> ISO standards is that they do not organically translate into  
> Linked-Data-compliant specifications, and this seems >not to be very  
> likely to improve. An alternative would be to move the entire discussion  
> to ISO, but I would strongly prefer an open and transparent >discussion  
> process without any formal entry barriers to interested contributors. A  
> W3C CG provides that, ISO doesn't.
>
> - As for ISO-related papers, these may or may not reflect the current  
> state of the standard or its published form. It is still safe to collect  
> open access (!) >versions of relevant scientific papers published on the  
> topics under  
> https://github.com/ld4lt/linguistic-annotation/tree/master/doc/iso.  
> Before, I had >created a private repository with the intent to collect  
> proprietary publications and share them in accordance with the  
> exceptions to (German) copyright >law for the sake of scientific  
> research/education, but it seems that sharing full publications is no  
> longer compliant with the latest revision of German >copyright law. If  
> we want to have such a repository, somebody from a country with a more  
> liberal copyright policy should create and maintain that >repository. A  
> candidate would be the US, where this would basically be fair use.
>
> - As for any W3C CG, the mid-term goal of our discussions is to provide  
> a community report, which could be, for example, (1) a survey or (2) a  
> >specification that brings together NIF, Web Annotation, *published* ISO  
> standards, etc. In my personal opinion, we should do *both*: a survey on  
> their >respective features (and we -- mostly Milan Dojchinovski and  
> myself -- have begun with that, see  
> https://github.com/ld4lt/linguistic-annotation/blob/>master/survey/required-features.md),  
> and then work towards a vocabulary. This vocabulary could then be input  
> for subsequent formal standardization, >either through W3C, ISO or both.  
> So, there is a possible relation to ISO, and to have some ties with ISO  
> remains relevant, but unless there is a way to >share ISO-internal  
> information in public (and as far as I can see, there isn't, at least  
> not on a community-level [at the level of individual cooperation, that's  
> >different]), this will have to be largely unidirectional, with ISO  
> taking potential input from us. The only way I can see direct input from  
> ISO is if people >involved in ISO standardization point us to their most  
> relevant publications on the topics.
>
> - (As many of you know) The COST Action "Nexus Linguarum. European  
> network for Web-centred linguistic data science" (CA 18209,  
> https://>nexuslinguarum.eu/) is a European network of experts on topics  
> of linguistic linked data and related topics. Since its establishment in  
> October 2019, it has >largely focused on internal consolidation and the  
> formulation of specific tasks and use cases. While that process is still  
> going on, much progress has >been demonstrated in the plenary meeting  
> that was held in the last two days. One of the tasks centers on  
> modelling linguistic data, with a sub-topic on >linguistic annotations,  
> which has formally taken up work in September 2020, and as many LD4LT  
> members are also active in Nexus, I would suggest to >collaborate with  
> this Nexus task on the creation of the survey of features of existing  
> (community) standards of linguistic annotation.
>
> Best regards,
> Christian
Received on Monday, 2 November 2020 13:45:11 UTC