[Action-126] David to come up with a proposal for mtConfidence from Dr. David Filip on 2012-07-31 (public-multilingualweb-lt@w3.org from July 2012)

From: Dr. David Filip <David.Filip@ul.ie>
Date: Tue, 31 Jul 2012 18:47:18 +0100
To: public-multilingualweb-lt@w3.org
Message-ID: <CANw5LKmqH1SrtPGUbUUOC9ziEGM6wA_0ymDnMbrjN5nM7Yr4kA@mail.gmail.com>

HI all, I was trying to engage a PhD student here at LRC to produce a
proposal for this data category but I failed.

Nevertheless, here is my thinking on the category that maybe someone else
(Declan?) could take it to the call for consensus stage.

I believe that mtConfidence is being produced in some form or other by all
major current MT systems. as discussed in Dublin, the issue is that these
confidence scores are not really comparable between engines, I mean not
only between Ging and Google, or Matrex, but even not between different
pair engines or even specific domain trained engines based on the same
general technology.

Nevertheless there are prospects for standardizing based on cognitive
effort on post-editing etc. Even knowing that the usability of confidence
scores is limited, there are valid production-consumption scenarios in the
content lifecycle.
If a client/service provider/translator/reviewer do repeatedly work with
the same engine, they will find even the engines self evaluation useful.

Further to this, there is potential of connecting this with automated and
human MT evaluation scores, so I'd propose to generalize as mtQuality
[mening raw MT quality, NOT talking about levels of PE] that would subume
mtConfidence etc. as seen below

My proposal of the data model based on the above

-mtQuality
--mtConfidence
---mtProducer [string identifying producer Bing, DCU-Matrex etc.]
----mtEngine [string identifying the engine on one of the above platforms,
can be potentially quite structured, pair domain etc.]
-----mtConfidenceScore [0-100% or interval 0-1]
--mtAutomatedMetrics
---mtScoreType [METEOR, TER, BLEU, Levensthein distance etc.]
----mtAutomatedMetricsScore [0-100% or interval 0-1]
--mtHumanMetrics
---mtHumanMetricsScale [{4,3,2,1,0},{0,1,2,3,4}.{3,2,1,0} etc.]
----mtHumanMetricsValue [one of the above values depending on scale]

mtQuality is an optional attribute of a machine text segment (as in Unicode
or localization segmentations). I do not think this is useful on higher or
lower levels.

mtQuality must be specified as mtConfidence XOR mtAutomatedMetrics
XOR mtHumanMetrics

Then comes the compulsory specification the actual value (eventaully
preceded by value change if more options exist)..

Cheers
dF


Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
*cellphone: +353-86-0222-158*
facsimile: +353-6120-2734
mailto: david.filip@ul.ie

Received on Tuesday, 31 July 2012 17:48:24 UTC