Re: [Action-126] David to come up with a proposal for mtConfidence from Phil Ritchie on 2012-08-01 (public-multilingualweb-lt@w3.org from August 2012)

From: Phil Ritchie <philr@vistatec.ie>
Date: Wed, 1 Aug 2012 09:59:34 +0100
To: "Dr. David Filip" <David.Filip@ul.ie>
Cc: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Message-ID: <OFAE8B66A7.FE21F52C-ON80257A4D.003137A0-80257A4D.0031663C@vistatec.ie>
David,

No. I wasn't proposing merging the categories at all. Just that you may be 
able to leverage some of the approach we have taken with attributes and 
their values. 

Phil.





From:   "Dr. David Filip" <David.Filip@ul.ie>
To:     Phil Ritchie <philr@vistatec.ie>, 
Cc:     "public-multilingualweb-lt@w3.org" 
<public-multilingualweb-lt@w3.org>
Date:   01/08/2012 01:06
Subject:        Re: [Action-126] David to come up with a proposal for 
mtConfidence



Thanks Phil, while MT quality scores are obviously quality related, I do 
not think there is much connection with quality assurance in the sense of 
error catching and classification.

mtQuality/mtConfidence is intended as merely providing a (relative) score 
per segment that is hoped to be eventually correlated in practice with 
usefulness of the segment. If anyone sees a potential in subsuming this 
into the quality model that is being worked on. I would not be against. I 
just think that this is different in an important way and most probably 
there would not be harm in keeping them separate. In the error based 
quality model there should for instance be post-editting errors. But the 
category as I proposed it is for raw MT only and should be archived or 
deleted, once the raw MT has been modified or replaced..

Cheers
dF

Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158 
facsimile: +353-6120-2734
mailto: david.filip@ul.ie



On Tue, Jul 31, 2012 at 7:16 PM, Phil Ritchie <philr@vistatec.ie> wrote:
David

Maybe I'm off beam here but your narrative seems to express many of the 
challenges we've had defining the Quality Category - i.e. classification 
across distinct systems that produce the classifications. Whilst I think 
the data categories should be separate perhaps aligning with the 
philosophy of our attributes would solve or at least bring consistency in 
approach.

Phil



On 31 Jul 2012, at 18:48, "Dr. David Filip" <David.Filip@ul.ie> wrote:

HI all, I was trying to engage a PhD student here at LRC to produce a 
proposal for this data category but I failed.

Nevertheless, here is my thinking on the category that maybe someone else 
(Declan?) could take it to the call for consensus stage.

I believe that mtConfidence is being produced in some form or other by all 
major current MT systems. as discussed in Dublin, the issue is that these 
confidence scores are not really comparable between engines, I mean not 
only between Ging and Google, or Matrex, but even not between different 
pair engines or even specific domain trained engines based on the same 
general technology.

Nevertheless there are prospects for standardizing based on cognitive 
effort on post-editing etc. Even knowing that the usability of confidence 
scores is limited, there are valid production-consumption scenarios in the 
content lifecycle.
If a client/service provider/translator/reviewer do repeatedly work with 
the same engine, they will find even the engines self evaluation useful.

Further to this, there is potential of connecting this with automated and 
human MT evaluation scores, so I'd propose to generalize as mtQuality 
[mening raw MT quality, NOT talking about levels of PE] that would subume 
mtConfidence etc. as seen below

My proposal of the data model based on the above

-mtQuality
--mtConfidence
---mtProducer [string identifying producer Bing, DCU-Matrex etc.]
----mtEngine [string identifying the engine on one of the above platforms, 
can be potentially quite structured, pair domain etc.]
-----mtConfidenceScore [0-100% or interval 0-1]
--mtAutomatedMetrics
---mtScoreType [METEOR, TER, BLEU, Levensthein distance etc.]
----mtAutomatedMetricsScore [0-100% or interval 0-1]
--mtHumanMetrics
---mtHumanMetricsScale [{4,3,2,1,0},{0,1,2,3,4}.{3,2,1,0} etc.]
----mtHumanMetricsValue [one of the above values depending on scale]

mtQuality is an optional attribute of a machine text segment (as in 
Unicode or localization segmentations). I do not think this is useful on 
higher or lower levels.

mtQuality must be specified as mtConfidence XOR mtAutomatedMetrics 
XOR mtHumanMetrics

Then comes the compulsory specification the actual value (eventaully 
preceded by value change if more options exist)..

Cheers
dF


Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158 
facsimile: +353-6120-2734
mailto: david.filip@ul.ie


************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.
www.vistatec.com
************************************************************


************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************
Received on Wednesday, 1 August 2012 09:00:24 UTC