Re: Atb.: [action 265] data category specific confidence scores from Tadej Stajner on 2012-11-14 (public-multilingualweb-lt@w3.org from November 2012)

From: Tadej Stajner <tadej.stajner@ijs.si>
Date: Wed, 14 Nov 2012 19:27:43 +0100
To: Dave Lewis <dave.lewis@cs.tcd.ie>
CC: Mārcis Pinnis <marcis.pinnis@Tilde.lv>, Dave Lewis <delewis@scss.tcd.ie>, Multilingual Web LT Public List <public-multilingualweb-lt@w3.org>
Message-ID: <50A3E29F.4060703@ijs.si>
Hi, Dave, Marcis,
(see below)

On 11/14/2012 5:47 PM, Dave Lewis wrote:
> thanks for the feedback, comment inline.
>
> On 13/11/2012 19:56, Mārcis Pinnis wrote:
>> Hi Dave,
>>
>> 1) I support your suggestion as drafted in the attachment.
>> 2) Although I believe there is a typing mistake:
>>
>> <p>And he said: you need a new <quote its:term="yes" 
>> its-info-term-ref=”http://www.directron.com/motherboards1.html” 
>> its-term-confidence=”0.5”>motherboard</quote></p>
>>
>> I believe its-info-term-ref should actually be its-term-info-ref?!
>
> thanks for spotting that, we'll fix it.
>
>>
>> 3) Also, just a comment (our systems won't be affected, but...) - why 
>> do you want to restrict the values to be from 0 to 1? In statistics 
>> it is quite common to use also LOG-scale probabilities (because of 
>> otherwise small numbers in some cases). Is it necessary to restrict 
>> users to a 0 to 1 interval? I would suggest leaving the decision up 
>> to the user's. Also - the tools will have to be identified anyway. 
>> This means that the users will be able to identify (if needed) from 
>> the systems how to parse (understand) the confidence scores. This is 
>> a general question that applies to other confidence scores as well.
>
> In general, we do not attach inter-tool significance to the confidence 
> scores, hence the requirement to specify the tool using its-tools- 
> ref. Normalising the score 0-1 is therefore not intended to support 
> inter-tool comparisons, but more give the the presenting software a 
> stable range/value to display.

On that note, I'd suggest explicitly adding a sentence that the scores 
are comparable only in the context of the same tool. It might be obvious 
to us, but it's an important point.

-- Tadej


>
> For Mt confidence score the concerned implementers suggested 0..1, the 
> use of log-scale didn't come up. So for no deeper reason that 
> consistency i'd then suggest we keep the same for term and disambig 
> confidence scores, unless there is a pressing reason to do otherwise.
>
> cheers,
> Dave
>
>> 4) I agree that in the current proposal it would not be reasonable to 
>> add a confidence score as in multiple domain scenario it would be 
>> misleading/wrong and it would require a different solution (For 
>> instance, similar to how domains can be marked).
>
>> Best regards,
>> Mārcis ;o)
>>
>> ________________________________________
>> No: Dave Lewis [dave.lewis@cs.tcd.ie]
>> Nosūtīts: otrdiena, 2012. gada 13. novembrī 19:30
>> Kam: Multilingual Web LT Public List
>> Tēma: Fwd: [action 265] data category specific confidence scores
>>
>> Hi all,
>> To try and wrap up this point:
>>
>> Summary  of Discussion so far:
>> 1) text analytics annotation was proposed as a way of offering a 
>> confidence score for text analytics results. As with mtconfidence 
>> score, the tools annotaiton is now covered by the itsTool feature, 
>> but the proposal for confidence scores remains
>>
>> 2) Marcis pointed out, using real world terminology use cases, that 
>> we may have several annotations operating on the same fragment, so 
>> applying a confidence score to different text analytics annotations 
>> with a single data category won't work in these cases because of 
>> complete override.
>>
>> Also, if we used text analytics annotation with annotation from other 
>> data categories we are breaking our 'no dependencies between data 
>> category rules'.
>>
>> 3) We could overcome the complete override problems using standoff 
>> mark up as in loc quality issue and provenance. But as confidence 
>> score would be different for each annotated fragment, that would 
>> result in very big stand-off records, and we would still be breaking 
>> the data cat dependencies rule. So this doesn't seem a realistic option
>>
>> 4) so the suggestion discussed in Lyon was to drop  text analytics 
>> annotation altogether as a separate data category and focus on adding 
>> confidence attributes to the existing data categories that would 
>> benefit from it.
>>
>> so.....
>>
>> Proposal:
>> I therefore suggest the following and we need your feedback by friday 
>> 16th Nov so we can wrap this up on the monday call!
>>
>> For those extended with confidence score (terminology, 
>> disambiguation) please express your support and any comments by 
>> friday - if we don't receive any we will definitely drop these 
>> suggestions. Marcis, Tadej in particular, please consider review these.
>>
>> For exclusions (domain, localizationQualityissue), this is your last 
>> chance to counter-argue in favour of including, otherwise assume 
>> these are dropped also.
>>
>> i) confidence for terminology: as suggested by Marcis 
>> (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0028.html), 
>> revised data category as word revisions attached (addition to local 
>> definition, note on its-tools and example  38)
>>
>> ii) confidence for disambiguation: revised data category as word 
>> revisions attached (addition to local definition, note on its-tools 
>> and ex 52)
>>
>> iii) domain: I suggest excluding this as an annotation to which we 
>> attach a confidence score. Its not clear that the use of text 
>> analytics to identify domain, while feasible, actually represents a 
>> real use case for interoperability mark-up. If use it would probably 
>> be internalized by the MT engine. Also, since there are multiple 
>> domain values the semantics of a single confidence score is unclear.
>>
>> iv) localizationQualityIssue: i suggest also excluding this as an 
>> annotation to which we attach confidence scores. The use of 
>> statistical text analytics doesn't seem common for QA tasks. One 
>> exception is the recent innovation by digital lingusitics whose 
>> Review Sentinel product ranks translation but a TA assessment for QA 
>> purposes - but this innovative and not current practice, so its 
>> probably not yet a concrete use case.
>>
>> cheers,
>> Dave
>>
>>
>>
>>
>>
>
>
Received on Wednesday, 14 November 2012 18:28:15 UTC