Re: Multidimensional Quality Metrics and Linguistic Linked Data from Dave Lewis on 2014-07-17 (public-ld4lt@w3.org from July 2014)

From: Dave Lewis <dave.lewis@cs.tcd.ie>
Date: Thu, 17 Jul 2014 16:15:56 +0100
To: Manuel.CARRASCO-BENITEZ@ec.europa.eu, public-ld4lt@w3.org
Message-ID: <53C7E8AC.6090607@cs.tcd.ie>
Hi Manuel,
As we discussed on the LD4LT call, I completely agree that we need to 
support automated annotations as well as manual ones. The meta-share 
ontology has support for this, but there may be requirements for 
capturing more meta-data in relation to automated annotation, such as 
confidence scores and the type, version, instance & provenance of the 
automated agent performing the annotation.

We touched on some of these issues in the MLW-LT working group in 
developing ITS2.0, but I think we'll hit them again when examining MQM 
in more detail as an annotation mechanims for linked data. This already 
includes both manual and automated quality annotations.

We also need to consider situation where automated annotation are run 
through a manual sampled QA check process, i.e. where annotation are a 
combination of a automated and manual process.

So any concrete requirements, use cases or examples you can expose from 
your current automated annotation work would be very welcome contributions.

Kind Regards,
Dave

On 10/07/2014 13:19, Manuel.CARRASCO-BENITEZ@ec.europa.eu wrote:
> Dave,
>
> I am working in preparing large amount of multilingual data programmatically. For smallish amount of data, one can consider manual annotations; for large amount is unrealistic and it has to be done programmatically.
>
> The same goes for the quality. First one has to aim for language independent quality techniques; for example,  statistical techniques for measuring corpus dependent length rations among n-languages. It could be later further refined with language dependent techniques.
>
> When the programs are more stable, they might be published.
>
> Regards
> Tomas
>
> -----Original Message-----
> From: Dave Lewis [mailto:dave.lewis@cs.tcd.ie]
> Sent: Wednesday, July 02, 2014 3:46 PM
> To: public-ld4lt@w3.org
> Subject: Multidimensional Quality Metrics and Linguistic Linked Data
>
> Hi all,
> One requirement that has come up several times in our discussions about
> lingusitic linked data is how to manage the quality of such data. One
> obvious advantage of using linked data for language resources is that it
> becomes easy for third parties to annotate different parts of that
> resource with quality assessment which can be used to drive quality
> management processes.
>
> To thins end we initialed some discussion with Arle Lommel from DFKI who
> has been driving the development of the Multidimensional Quality Metrics
> (MQM) specification in the QTLauchpad project:
> http://www.qt21.eu/mqm-definition/definition-2014-06-04.html
>
> This encompasses a wide range of established quality assessment metric
> for translation, and therefore may provide a concrete model that could
> be used more widely in annotating the lingusitic quality of multilingual
> linked data.
>
> There has also been a discussion in the ITS IG about the benefits to the
> MQM spec itself from an RDF mapping:
>
> http://lists.w3.org/Archives/Public/public-i18n-its-ig/2014Jun/0009.html
>
> We plan to cover this in tomorrow's LD4LT call, but please send any
> additional thoughts you may have on this to the list, e.g. on other existing quality
> annotation vocabularies in use for linked data.
>
> cheers,
> Dave
>
>
Received on Thursday, 17 July 2014 15:10:15 UTC