Re: Atb.: [action 265] data category specific confidence scores from Dave Lewis on 2012-11-14 (public-multilingualweb-lt@w3.org from November 2012)

From: Dave Lewis <dave.lewis@cs.tcd.ie>
Date: Wed, 14 Nov 2012 16:47:47 +0000
To: Mārcis Pinnis <marcis.pinnis@Tilde.lv>
CC: Dave Lewis <delewis@scss.tcd.ie>, Multilingual Web LT Public List <public-multilingualweb-lt@w3.org>
Message-ID: <50A3CB33.9020600@cs.tcd.ie>
thanks for the feedback, comment inline.

On 13/11/2012 19:56, Mārcis Pinnis wrote:
> Hi Dave,
>
> 1) I support your suggestion as drafted in the attachment.
> 2) Although I believe there is a typing mistake:
>
> <p>And he said: you need a new <quote its:term="yes" its-info-term-ref=”http://www.directron.com/motherboards1.html” its-term-confidence=”0.5”>motherboard</quote></p>
>
> I believe its-info-term-ref should actually be its-term-info-ref?!

thanks for spotting that, we'll fix it.

>
> 3) Also, just a comment (our systems won't be affected, but...) - why do you want to restrict the values to be from 0 to 1? In statistics it is quite common to use also LOG-scale probabilities (because of otherwise small numbers in some cases). Is it necessary to restrict users to a 0 to 1 interval? I would suggest leaving the decision up to the user's. Also - the tools will have to be identified anyway. This means that the users will be able to identify (if needed) from the systems how to parse (understand) the confidence scores. This is a general question that applies to other confidence scores as well.

In general, we do not attach inter-tool significance to the confidence 
scores, hence the requirement to specify the tool using its-tools- ref. 
Normalising the score 0-1 is therefore not intended to support 
inter-tool comparisons, but more give the the presenting software a 
stable range/value to display.

For Mt confidence score the concerned implementers suggested 0..1, the 
use of log-scale didn't come up. So for no deeper reason that 
consistency i'd then suggest we keep the same for term and disambig 
confidence scores, unless there is a pressing reason to do otherwise.

cheers,
Dave

> 4) I agree that in the current proposal it would not be reasonable to add a confidence score as in multiple domain scenario it would be misleading/wrong and it would require a different solution (For instance, similar to how domains can be marked).

> Best regards,
> Mārcis ;o)
>
> ________________________________________
> No: Dave Lewis [dave.lewis@cs.tcd.ie]
> Nosūtīts: otrdiena, 2012. gada 13. novembrī 19:30
> Kam: Multilingual Web LT Public List
> Tēma: Fwd: [action 265] data category specific confidence scores
>
> Hi all,
> To try and wrap up this point:
>
> Summary  of Discussion so far:
> 1) text analytics annotation was proposed as a way of offering a confidence score for text analytics results. As with mtconfidence score, the tools annotaiton is now covered by the itsTool feature, but the proposal for confidence scores remains
>
> 2) Marcis pointed out, using real world terminology use cases, that we may have several annotations operating on the same fragment, so applying a confidence score to different text analytics annotations with a single data category won't work in these cases because of complete override.
>
> Also, if we used text analytics annotation with annotation from other data categories we are breaking our 'no dependencies between data category rules'.
>
> 3) We could overcome the complete override problems using standoff mark up as in loc quality issue and provenance. But as confidence score would be different for each annotated fragment, that would result in very big stand-off records, and we would still be breaking the data cat dependencies rule. So this doesn't seem a realistic option
>
> 4) so the suggestion discussed in Lyon was to drop  text analytics annotation altogether as a separate data category and focus on adding confidence attributes to the existing data categories that would benefit from it.
>
> so.....
>
> Proposal:
> I therefore suggest the following and we need your feedback by friday 16th Nov so we can wrap this up on the monday call!
>
> For those extended with confidence score (terminology, disambiguation) please express your support and any comments by friday - if we don't receive any we will definitely drop these suggestions. Marcis, Tadej in particular, please consider review these.
>
> For exclusions (domain, localizationQualityissue), this is your last chance to counter-argue in favour of including, otherwise assume these are dropped also.
>
> i) confidence for terminology: as suggested by Marcis (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0028.html), revised data category as word revisions attached (addition to local definition, note on its-tools and example  38)
>
> ii) confidence for disambiguation: revised data category as word revisions attached (addition to local definition, note on its-tools and ex 52)
>
> iii) domain: I suggest excluding this as an annotation to which we attach a confidence score. Its not clear that the use of text analytics to identify domain, while feasible, actually represents a real use case for interoperability mark-up. If use it would probably be internalized by the MT engine. Also, since there are multiple domain values the semantics of a single confidence score is unclear.
>
> iv) localizationQualityIssue: i suggest also excluding this as an annotation to which we attach confidence scores. The use of statistical text analytics doesn't seem common for QA tasks. One exception is the recent innovation by digital lingusitics whose Review Sentinel product ranks translation but a TA assessment for QA purposes - but this innovative and not current practice, so its probably not yet a concrete use case.
>
> cheers,
> Dave
>
>
>
>
>
Received on Wednesday, 14 November 2012 16:42:39 UTC