RE: [Issue-41][Action-190] Draft a section about mtConfidence, based on the discussion from Yves Savourel on 2012-08-09 (public-multilingualweb-lt@w3.org from August 2012)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Wed, 8 Aug 2012 21:31:47 -0600
To: "'Dr. David Filip'" <David.Filip@ul.ie>
CC: <public-multilingualweb-lt@w3.org>
Message-ID: <assp.0568ab9ce3.assp.0568923e4c.003001cd75df$81c2da80$85488f80$@com>

Thanks for the explanations David.


> the XOR just includes slightly different header 
> as another example inserted in the example 

IMO examples should be straight real files, that we can actually process.
If we want to show two ways to do something we should use two separate examples.


> The value "en-t-cs" follow the t extension 
> syntax from BCP 47, so it means English transformed 
> from Czech. I am aware that the usual MT pair 
> convention is the other way round (I use this 
> convention in the private string examples), but I 
> thought that the t extension would find valid usage here.. 

I see that now. So none of my notes about text that should be translated stand.

This said, I'm not sure using the t extension is a good way to identify an 'engine', in addition to be counter intuitive we don't really intend to standardize that value don't we? So I suppose one example can use it, but we could have several other examples maybe.


> I do not understand this part at all. MT candidate translations
> are always 100% matches in the terms of TM matching. 
> The self-reported confidence expresses what might be the 
> chance that the 100% match is accurate/usable.. I do not 
> think we need a combined value here. And this is also a 
> reason why XLIFF would need a separate mechanism for 
> reporting the confidence, we could not overload the normal 
> match rate..

I guess the point I was making was that Bing doesn't provide 0-100% confidence score. So if we use this as an example we should explain how we get it. Or use another example.


>> I'm not understanding why it's there. I think you 
>> mean that the global rule must not use that attribute. 
>> Then just don't say anything. If it's not listed it 
>> cannot be used (it's just not an attribute of 
>> <mtConfidenceRule>)
>
> It is true, still in my experience redundancy serves 
> the purpose of absolute clarity 

As a developer I'm utterly confused to see a mention of an attribute that does not exist in that element.


> Well, the whole point is that the score is 
> worth nothing at all if you do not know what 
> the producer and engine are. I first thought that 
> GLOBAL does not make sense at all for confidence. 
> But later reintroduced GLOBAL for producer and 
> engine, as they are likely to be the same throughout 
> the whole document in many scenarios, so that 
> you can save lot of space not specifying them for 
> each and every segment. So mtProducer and mtEngine 
> are only optional at the local level if thez 
> have been specified at the gloabl level

So your real goal is to have a value set for mtProducer and mtEngine when we have a local mtConfidenceScore. You don't really care how or where it is set, right?
Then we should have defaults for those values. Validating if those attribute should be defined locally or not based on whether they are defined at a higher level is going to be very difficult to implement.


> And there must be a processing requirement to 
> move them onto the segment level should the 
> header be separated during processing..

Some formats may not allow those attribute elsewhere than the top of the document. But anyway, I don't think we can have such processing requirements for ITS. Default values solve all this as far as I can tell.

I would also disagree that the score is worth nothing without the mtProvider and mtEngine values. Actually, in many scenarios knowing the provider or the engine means diddly squat to the end-users. They just care about the score.


Cheers,
-yves

Received on Thursday, 9 August 2012 03:32:19 UTC