W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > August 2012

RE: [Issue-41][Action-190] Draft a section about mtConfidence, based on the discussion

From: Yves Savourel <ysavourel@enlaso.com>
Date: Wed, 8 Aug 2012 15:58:01 -0600
To: <public-multilingualweb-lt@w3.org>
Message-ID: <assp.05678ca958.assp.05674e221e.00a601cd75b0$e12a1da0$a37e58e0$@com>
Hi David,

A few notes:


-- the element mtConfidence should be mtConfidenceRule to follow our usual pattern.

-- I’m not sure I understand the example with:

>  its:mtEngine="en-t-cs" />
> XOR
> <its:disambiguation selector="/text/body/p/"
> its:mtProducer=”vanilla Moses”
> its:mtEngine="medical:EN-ES_LA” />
>    </its:rules

Is it a typo or is 'disambiguation' (should be disambiguationRule BTW) involved?
Also, why do we have two rules in the example?

Also that example says it's using a EN to ES_LA engine, but the text looks very English to me (for a 89.82% confidence that doesn't look good :)

-- I would suggest to allow only one type of value not 0.0 to 1.0 or 0-100%.

-- I'm not sure the paragraph:

"MT confidence can be displayed on websites machine translated on the fly, by simple translation editors, and Computer Aided Translation (CAT) tools. To facilitate usage in CAT tools, the data category should be promoted for inclusion in the match element of XLIFF 2.0. MT Confidence MAY be displayed for human consumers as segment annotation or as color-coded font or background."

Brings anything to the definition of the data category. Or maybe it could be re-worded and moved to the list of possible purposes.

I agree with the bit about XLIFF, but I don't think it should be noted in the specification.


--- For the example:

> <body>
>   <p><span its:mtProducer=”Bing Translator” its:mtEngine=”en-t-cs” 
> its:mtConfidenceScore=”89.82%”>Dublin is the capital city of Ireland.</p>
>   </body>
> </text>

The text should be in Czech not English.

Also, Jan can correct me, but I think the confidence for Bing Translator would be some combination of the MatchDegree and the Rating values it return. They certainly can be somehow mashed into a single value, but maybe we could use a more straightforward example?


--- Example 32:

The text should be in (hopefully) Czech.
Also there is no space between the two sentences.
And the double quotes should be ASCII

The sentence "Prague is the capital city of Prague in the Czech Republic." Is weird.


--- The text of the GLOBAL section says:

"mtConfidenceScore MUST NOT be specified globally MUST NOT be specified globally."

I'm not understanding why it's there. I think you mean that the global rule must not use that attribute. Then just don't say anything. If it's not listed it cannot be used (it's just not an attribute of <mtConfidenceRule>)


--- In the LOCAL section the text says:

"All of the following MUST be specified locally, UNLESS mtProducer and mtEngine have been specified globally."

It's not clear if mtProducer and mtEngine are allowed or not. Also, I don't think we should have dependency like that: one may look at a paragraph without having access to the top of the document.

I think we should simply say:

- An mtConfidenceScore
- An optional mtProducer
- An optional mtEngine


--- attribute names

If we have mtConfidenceScore we  probably should have mtConfidenceProducer and mtConfidenceEngine, like for the other data categories.


Cheers,
-ys
Received on Wednesday, 8 August 2012 22:01:58 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:59 UTC