[Issue-41, issue-42, Action-190, action-194] Draft a section about mtConfidence, based on the discussion from Felix Sasaki on 2012-08-23 (public-multilingualweb-lt@w3.org from August 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 23 Aug 2012 09:41:44 +0200
To: public-multilingualweb-lt@w3.org
Message-ID: <CAL58czrnLauzznGfCymfHLdCr_9hR+GQxmALMqXDuTzAq6b9_w@mail.gmail.com>
Hi Dave, we discussed this on the call during your absence. The general
opinion was that the the information needed for mtconfidence, quality and
disambiguation is very similar and very specific. I had a brief look at the
drafts for the three data categories and came to that conclusion, hence the
issue-42 (drafted before the call).

The also discussed whether this would be a separate data category, or
whether we need to interrelate data category. Instead of going this path
the rough consensus was that it is OK if the three data categories convey
the same information - we should just try to harmonize the description of
aspects like score or tool identification. Hence my action-194 related to
issue-42, to come up with such a harmonization proposal. I am not sure yet
if I'll get to it before the call.

Best,

Felix

Am Donnerstag, 23. August 2012 schrieb Dave Lewis :

Felix,
> I've only now got to your post on ISSUE-42 :
> http://lists.w3.org/Archives/**Public/public-multilingualweb-**
> lt/2012Aug/0149.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html>
>
> I think with the combination of mtconfidence score and translationAgent I
> suggest below is suggesting pretty much the same thing, just arrived at via
> a different route. Was that the direction you were heading in?
>
> The translationReviewAgent could work similarly with quality, and we could
> add a sourceReviewAgent or terminologyReviewAgent, or generalise
> translationReviewAgent or qualityReviewAgent to address
> textAnalysisAnnotation.
>
> One point here is that as different data categories are separably
> conformant, will specifications of how they are used in combination
> essentially have to be non-normative, or would we need a distinct normative
> data category in combination section?
>
> cheers,
> Dave
>
> On 23/08/2012 01:56, Dave Lewis wrote:
>
>> Yves, David,
>> Apologies coming to this thread a bit late. You've already pointed out
>> that the score needs to be mostly local, i.e. per segment as passed to an
>> MT service, while the definition of providers/engine would be more likely
>> global, i.e. the same engine would be used for most segments in a document.
>> We also have distinct use cases where only the score is relevant or where
>> the score and the service is needed. So it seems that two data categories
>> would suite, one for score and one for identifying the engine.
>>
>> We do however already a way of identifying an MT service that has been
>> used on a document or its segments, in the form of translationAgent (see
>> call for concensus http://lists.w3.org/Archives/**
>> Public/public-multilingualweb-**lt/2012Jul/0256.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0256.html>
>> ).
>>
>> I propose therefore that translationAgent used in conjunction with an
>> mtConfidence score data category that had just one score attribute would
>> therefore cover the different use cases while also supporting the existing
>> use cases outlined for translationAgent.
>>
>> Note translationAgent allows multiple agents to be specified, but doesn't
>> concern itself with distinguishing the types of agent, e.g.
>> provider/organisation from software/engine, though both are possible. The
>> form of the ID or the result of dereferencing it is assumed to address
>> this, given the lack of common namign schemes for organsiations or engines.
>>
>> I'd be happy anyway to include the example IDs from mtConfidence engine
>> attribute into translationAgent - as these are sensible ideas, and
>> something we could address more comprehensively as best practice next year.
>>
>> cheers,
>> Dave
>>
>>
>>
>> On 09/08/2012 13:56, Yves Savourel wrote:
>>
>>> The end user who does not understand this MUST NOT be exposed to values
>>>> >coming from mixed engines/producers.
>>>> >In other words it is OK to DISPLAY SCORE ONLY TO THE END USER
>>>> >if you have ensured up the stream that they DO come from the same
>>>> >producer AND engine.
>>>> >Again not sure how to cut this with defaults, as the defaults would
>>>> >collapse filtering.
>>>>
>>> Again all this applies only when you have translations for different
>>> providers/engines for the same text. That only one part of the scenarios.
>>>
>>>   In any case, the bottom line is that making a local attribute presence
>>> required or not based on whether a global one is present or not is not
>>> easily implementable. It could be defined in an linked rule file for
>>> example.
>>>
>>> What I think you really try to do is make sure a value is define for
>>> mtProducer and mtEngine. I don't agree that one is always need, but that is
>>> a different topic (as discussed above). But if we decide one is needed, we
>>> can just state that one must be define. It doesn't make sense to me to try
>>> to define how or where it should be defined: the inheritance takes care of
>>> that.
>>>
>>
>>
>>
>
>
Received on Thursday, 23 August 2012 07:42:09 UTC