- From: Christian Chiarcos <christian.chiarcos@web.de>
- Date: Wed, 28 Oct 2020 10:51:40 +0100
- To: "Linked Data for Language Technology Community Group" <public-ld4lt@w3.org>
- Message-ID: <op.0s6y0esxbr5td5@kitaba>
Dear all, after an extended summer break, it is time to take up LD4LT annotation telcos, again. I created a Doodle under https://doodle.com/poll/2bvb78z42tpsa5fm. The new Doodle is necessary because the original time slot, Thu 10-11 CE(S)T, has a risk of clashing with Nexus Linguarum telcos (see last point below). Major developments in and after the July telco: - After we spent much of the last two telcos on discussing the relation between W3C, resp., their specifications, and ISO, resp., their drafts, it became clear that any public discussion of drafts or other internal documentation of ISO specifications is discouraged by ISO and its national partner organizations. Moreover, it does not seem to be possible to enter a formal relationship between W3C CGs and ISO (for legal reasons, not for scientific ones) to arrange an official exchange of ideas. In other words, the extent to which any public discussion on the development of community conventions for linguistic annotations on the web can include information from/about ISO standards is limited to publicly available information (basically, scientific publications) that describe the respective standards or their underlying concepts. Regardless of whether they are fully identical to the eventual ISO standard, this is necessary to benefit from the discussions and expertise that has been going into these specifications, as we clearly do not want to re-invent the wheel, but to contribute to a broadly applicable and inclusive Linked-Data-based ecosystem for language technology and language sciences on the web. One current problem of the ISO standards is that they do not organically translate into Linked-Data-compliant specifications, and this seems not to be very likely to improve. An alternative would be to move the entire discussion to ISO, but I would strongly prefer an open and transparent discussion process without any formal entry barriers to interested contributors. A W3C CG provides that, ISO doesn't. - As for ISO-related papers, these may or may not reflect the current state of the standard or its published form. It is still safe to collect open access (!) versions of relevant scientific papers published on the topics under https://github.com/ld4lt/linguistic-annotation/tree/master/doc/iso. Before, I had created a private repository with the intent to collect proprietary publications and share them in accordance with the exceptions to (German) copyright law for the sake of scientific research/education, but it seems that sharing full publications is no longer compliant with the latest revision of German copyright law. If we want to have such a repository, somebody from a country with a more liberal copyright policy should create and maintain that repository. A candidate would be the US, where this would basically be fair use. - As for any W3C CG, the mid-term goal of our discussions is to provide a community report, which could be, for example, (1) a survey or (2) a specification that brings together NIF, Web Annotation, *published* ISO standards, etc. In my personal opinion, we should do *both*: a survey on their respective features (and we -- mostly Milan Dojchinovski and myself -- have begun with that, see https://github.com/ld4lt/linguistic-annotation/blob/master/survey/required-features.md), and then work towards a vocabulary. This vocabulary could then be input for subsequent formal standardization, either through W3C, ISO or both. So, there is a possible relation to ISO, and to have some ties with ISO remains relevant, but unless there is a way to share ISO-internal information in public (and as far as I can see, there isn't, at least not on a community-level [at the level of individual cooperation, that's different]), this will have to be largely unidirectional, with ISO taking potential input from us. The only way I can see direct input from ISO is if people involved in ISO standardization point us to their most relevant publications on the topics. - (As many of you know) The COST Action "Nexus Linguarum. European network for Web-centred linguistic data science" (CA 18209, https://nexuslinguarum.eu/) is a European network of experts on topics of linguistic linked data and related topics. Since its establishment in October 2019, it has largely focused on internal consolidation and the formulation of specific tasks and use cases. While that process is still going on, much progress has been demonstrated in the plenary meeting that was held in the last two days. One of the tasks centers on modelling linguistic data, with a sub-topic on linguistic annotations, which has formally taken up work in September 2020, and as many LD4LT members are also active in Nexus, I would suggest to collaborate with this Nexus task on the creation of the survey of features of existing (community) standards of linguistic annotation. Best regards, Christian
Received on Wednesday, 28 October 2020 09:52:14 UTC