Re: Atb.: [action 265] data category specific confidence scores from Felix Sasaki on 2012-11-20 (public-multilingualweb-lt@w3.org from November 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 20 Nov 2012 22:07:08 +0100
To: public-multilingualweb-lt@w3.org, dave lewis <dave.lewis@cs.tcd.ie>, tadej.stajner@ijs.si, marcis.pinnis@Tilde.lv
Message-ID: <50ABF0FC.5070107@w3.org>
Hi Dave, Marcis, Tadej, all,

Am 14.11.12 19:27, schrieb Tadej Stajner:
> Hi, Dave, Marcis,
> (see below)
>
> On 11/14/2012 5:47 PM, Dave Lewis wrote:
>> thanks for the feedback, comment inline.
>>
>> On 13/11/2012 19:56, Mārcis Pinnis wrote:
>>> Hi Dave,
>>>
>>> 1) I support your suggestion as drafted in the attachment.
>>> 2) Although I believe there is a typing mistake:
>>>
>>> <p>And he said: you need a new <quote its:term="yes" 
>>> its-info-term-ref=”http://www.directron.com/motherboards1.html” 
>>> its-term-confidence=”0.5”>motherboard</quote></p>
>>>
>>> I believe its-info-term-ref should actually be its-term-info-ref?!
>>
>> thanks for spotting that, we'll fix it.

This should be fixed now at

http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-terms-selector-4

>>
>>>
>>> 3) Also, just a comment (our systems won't be affected, but...) - 
>>> why do you want to restrict the values to be from 0 to 1? In 
>>> statistics it is quite common to use also LOG-scale probabilities 
>>> (because of otherwise small numbers in some cases). Is it necessary 
>>> to restrict users to a 0 to 1 interval? I would suggest leaving the 
>>> decision up to the user's. Also - the tools will have to be 
>>> identified anyway. This means that the users will be able to 
>>> identify (if needed) from the systems how to parse (understand) the 
>>> confidence scores. This is a general question that applies to other 
>>> confidence scores as well.
>>
>> In general, we do not attach inter-tool significance to the 
>> confidence scores, hence the requirement to specify the tool using 
>> its-tools- ref. Normalising the score 0-1 is therefore not intended 
>> to support inter-tool comparisons, but more give the the presenting 
>> software a stable range/value to display.
>
> On that note, I'd suggest explicitly adding a sentence that the scores 
> are comparable only in the context of the same tool. It might be 
> obvious to us, but it's an important point.


I tried to add that to the 1st paragraph at
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation
for terminology, mt confidence and disambiguation.

Best,

Felix

>
> -- Tadej
>
>
>>
>> For Mt confidence score the concerned implementers suggested 0..1, 
>> the use of log-scale didn't come up. So for no deeper reason that 
>> consistency i'd then suggest we keep the same for term and disambig 
>> confidence scores, unless there is a pressing reason to do otherwise.
>>
>> cheers,
>> Dave
>>
>>> 4) I agree that in the current proposal it would not be reasonable 
>>> to add a confidence score as in multiple domain scenario it would be 
>>> misleading/wrong and it would require a different solution (For 
>>> instance, similar to how domains can be marked).
>>
>>> Best regards,
>>> Mārcis ;o)
>>>
>>> ________________________________________
>>> No: Dave Lewis [dave.lewis@cs.tcd.ie]
>>> Nosūtīts: otrdiena, 2012. gada 13. novembrī 19:30
>>> Kam: Multilingual Web LT Public List
>>> Tēma: Fwd: [action 265] data category specific confidence scores
>>>
>>> Hi all,
>>> To try and wrap up this point:
>>>
>>> Summary  of Discussion so far:
>>> 1) text analytics annotation was proposed as a way of offering a 
>>> confidence score for text analytics results. As with mtconfidence 
>>> score, the tools annotaiton is now covered by the itsTool feature, 
>>> but the proposal for confidence scores remains
>>>
>>> 2) Marcis pointed out, using real world terminology use cases, that 
>>> we may have several annotations operating on the same fragment, so 
>>> applying a confidence score to different text analytics annotations 
>>> with a single data category won't work in these cases because of 
>>> complete override.
>>>
>>> Also, if we used text analytics annotation with annotation from 
>>> other data categories we are breaking our 'no dependencies between 
>>> data category rules'.
>>>
>>> 3) We could overcome the complete override problems using standoff 
>>> mark up as in loc quality issue and provenance. But as confidence 
>>> score would be different for each annotated fragment, that would 
>>> result in very big stand-off records, and we would still be breaking 
>>> the data cat dependencies rule. So this doesn't seem a realistic option
>>>
>>> 4) so the suggestion discussed in Lyon was to drop  text analytics 
>>> annotation altogether as a separate data category and focus on 
>>> adding confidence attributes to the existing data categories that 
>>> would benefit from it.
>>>
>>> so.....
>>>
>>> Proposal:
>>> I therefore suggest the following and we need your feedback by 
>>> friday 16th Nov so we can wrap this up on the monday call!
>>>
>>> For those extended with confidence score (terminology, 
>>> disambiguation) please express your support and any comments by 
>>> friday - if we don't receive any we will definitely drop these 
>>> suggestions. Marcis, Tadej in particular, please consider review these.
>>>
>>> For exclusions (domain, localizationQualityissue), this is your last 
>>> chance to counter-argue in favour of including, otherwise assume 
>>> these are dropped also.
>>>
>>> i) confidence for terminology: as suggested by Marcis 
>>> (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0028.html), 
>>> revised data category as word revisions attached (addition to local 
>>> definition, note on its-tools and example  38)
>>>
>>> ii) confidence for disambiguation: revised data category as word 
>>> revisions attached (addition to local definition, note on its-tools 
>>> and ex 52)
>>>
>>> iii) domain: I suggest excluding this as an annotation to which we 
>>> attach a confidence score. Its not clear that the use of text 
>>> analytics to identify domain, while feasible, actually represents a 
>>> real use case for interoperability mark-up. If use it would probably 
>>> be internalized by the MT engine. Also, since there are multiple 
>>> domain values the semantics of a single confidence score is unclear.
>>>
>>> iv) localizationQualityIssue: i suggest also excluding this as an 
>>> annotation to which we attach confidence scores. The use of 
>>> statistical text analytics doesn't seem common for QA tasks. One 
>>> exception is the recent innovation by digital lingusitics whose 
>>> Review Sentinel product ranks translation but a TA assessment for QA 
>>> purposes - but this innovative and not current practice, so its 
>>> probably not yet a concrete use case.
>>>
>>> cheers,
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>
Received on Tuesday, 20 November 2012 21:07:36 UTC