RE: Multidimensional Quality Metrics and Linguistic Linked Data


I am working in preparing large amount of multilingual data programmatically. For smallish amount of data, one can consider manual annotations; for large amount is unrealistic and it has to be done programmatically.

The same goes for the quality. First one has to aim for language independent quality techniques; for example,  statistical techniques for measuring corpus dependent length rations among n-languages. It could be later further refined with language dependent techniques.

When the programs are more stable, they might be published.


-----Original Message-----
From: Dave Lewis [] 
Sent: Wednesday, July 02, 2014 3:46 PM
Subject: Multidimensional Quality Metrics and Linguistic Linked Data

Hi all,
One requirement that has come up several times in our discussions about 
lingusitic linked data is how to manage the quality of such data. One 
obvious advantage of using linked data for language resources is that it 
becomes easy for third parties to annotate different parts of that 
resource with quality assessment which can be used to drive quality 
management processes.

To thins end we initialed some discussion with Arle Lommel from DFKI who 
has been driving the development of the Multidimensional Quality Metrics 
(MQM) specification in the QTLauchpad project:

This encompasses a wide range of established quality assessment metric 
for translation, and therefore may provide a concrete model that could 
be used more widely in annotating the lingusitic quality of multilingual 
linked data.

There has also been a discussion in the ITS IG about the benefits to the 
MQM spec itself from an RDF mapping:

We plan to cover this in tomorrow's LD4LT call, but please send any
additional thoughts you may have on this to the list, e.g. on other existing quality
annotation vocabularies in use for linked data.


Received on Thursday, 10 July 2014 12:20:26 UTC