- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 1 Aug 2012 09:50:18 +0200
- To: "Dr. David Filip" <David.Filip@ul.ie>
- Cc: public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czrPTo3KJGZbYFb3Eh6XZshK7M=hsDd8_5wG+bLaiOiXPw@mail.gmail.com>
Hi David, all, 2012/7/31 Dr. David Filip <David.Filip@ul.ie> > HI all, I was trying to engage a PhD student here at LRC to produce a > proposal for this data category but I failed. > > Nevertheless, here is my thinking on the category that maybe someone else > (Declan?) could take it to the call for consensus stage. > co-chair hat on: If there is no strong support for this, I would propose to put this on hold until we have finished all other data categories. As you wrote in your agenda, http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0311.html we have various data categories proposals on the table that are not finished: special requirements, named entity, quality, ... I will send a proposal for the time until last call later today, which will show that we need to finish these and the various "ed. notes" in http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html I think we need the time and your input to work on these. I very much hope for your understanding - let's discuss this also during the call on Thursday, Felix > > I believe that mtConfidence is being produced in some form or other by all > major current MT systems. as discussed in Dublin, the issue is that these > confidence scores are not really comparable between engines, I mean not > only between Ging and Google, or Matrex, but even not between different > pair engines or even specific domain trained engines based on the same > general technology. > > Nevertheless there are prospects for standardizing based on cognitive > effort on post-editing etc. Even knowing that the usability of confidence > scores is limited, there are valid production-consumption scenarios in the > content lifecycle. > If a client/service provider/translator/reviewer do repeatedly work with > the same engine, they will find even the engines self evaluation useful. > > Further to this, there is potential of connecting this with automated and > human MT evaluation scores, so I'd propose to generalize as mtQuality > [mening raw MT quality, NOT talking about levels of PE] that would subume > mtConfidence etc. as seen below > > My proposal of the data model based on the above > > -mtQuality > --mtConfidence > ---mtProducer [string identifying producer Bing, DCU-Matrex etc.] > ----mtEngine [string identifying the engine on one of the above platforms, > can be potentially quite structured, pair domain etc.] > -----mtConfidenceScore [0-100% or interval 0-1] > --mtAutomatedMetrics > ---mtScoreType [METEOR, TER, BLEU, Levensthein distance etc.] > ----mtAutomatedMetricsScore [0-100% or interval 0-1] > --mtHumanMetrics > ---mtHumanMetricsScale [{4,3,2,1,0},{0,1,2,3,4}.{3,2,1,0} etc.] > ----mtHumanMetricsValue [one of the above values depending on scale] > > mtQuality is an optional attribute of a machine text segment (as in > Unicode or localization segmentations). I do not think this is useful on > higher or lower levels. > > mtQuality must be specified as mtConfidence XOR mtAutomatedMetrics > XOR mtHumanMetrics > > Then comes the compulsory specification the actual value (eventaully > preceded by value change if more options exist).. > > Cheers > dF > > > Dr. David Filip > ======================= > LRC | CNGL | LT-Web | CSIS > University of Limerick, Ireland > telephone: +353-6120-2781 > *cellphone: +353-86-0222-158* > facsimile: +353-6120-2734 > mailto: david.filip@ul.ie > > -- Felix Sasaki DFKI / W3C Fellow
Received on Wednesday, 1 August 2012 07:50:59 UTC