W3C home > Mailing lists > Public > public-mlw-workshop@w3.org > October to December 2012

PLEASE IGNORE Re: [action 265] data category specific confidence scores

From: Dave Lewis <dave.lewis@cs.tcd.ie>
Date: Tue, 13 Nov 2012 17:28:48 +0000
Message-ID: <50A28350.7050708@cs.tcd.ie>
To: public-mlw-workshop@w3.org
Apologies, please ignore, I sent this to the wrong mailing list.

On 13/11/2012 17:20, Dave Lewis wrote:
> Hi all,
> To try and wrap up this point:
> _Reasoning so far:_
> 1) text analytics annotation was proposed as a way of offering a 
> confidence score for text analytics results. As with mtconfidence 
> score, the tools annotaiton is now covered by the itsTool feature, but 
> the proposal for confidence scores remains
> 2) Marcis pointed out, using real world terminology use cases, that we 
> may have several annotations operating on the same fragment, so 
> applying a confidence score to different text analytics annotations 
> with a single data category won't work in these cases because of 
> complete override.
> Also, if we used text analytics annotation with annotation from other 
> data categories we are breaking our 'no dependencies between data 
> category rules'.
> 3) We could overcome the complete override problems using standoff 
> mark up as in loc quality issue and provenance. But as confidence 
> score would be different for each annotated fragment, that would 
> result in very big stand-off records, and we would still be breaking 
> the data cat dependencies rule. So this doesn't seem a realistic option
> 4) so the suggestion discussed in Lyon was to drop  text analytics 
> annotation altogether as a separate data category and focus on adding 
> confidence attributes to the existing data categories that would 
> benefit from it.
> _Proposal:_
> So i suggest the following and we need you feedback by friday 16th Nov 
> so we can wrap this up on the monday call!
> For those extended with confidence score (terminology, disambiguation) 
> please express your support and any comments by friday - if we don't 
> receive any we will definitely drop these suggestions. Marcis, Tadej 
> in particular, please consider review these.
> For exclusions (domain, localizationQualityissue), this is your last 
> chance to counter-argue in favour of including, otherwise assume these 
> are dropped also.
> i) confidence for terminology: as suggested by Marcis 
> (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0028.html), 
> revised data category as word revisions attached (addition to local 
> definition, note on its-tools and example  38)
> ii) confidence for disambiguation: revised data category as word 
> revisions attached (addition to local definition, note on its-tools 
> and ex 52)
> iii) domain: I suggest _excluding _this as an annotation to which we 
> attach a confidence score. Its not clear that the use of text 
> analytics to identify domain, while feasible, actually represents a 
> real use case for interoperability mark-up. If use it would probably 
> be internalized by the MT engine. Also, since there are multiple 
> domain values the semantics of a single confidence score is unclear.
> iv) localizationQualityIssue: i suggest also excluding this as an 
> annotation to which we attach confidence scores. The use of 
> statistical text analytics doesn't seem common for QA tasks. One 
> exception is the recent innovation by digital lingusitics whose Review 
> Sentinel product ranks translation but a TA assessment for QA purposes 
> - but this innovative and not current practice, so its probably not 
> yet a concrete use case.
> cheers,
> Dave
Received on Tuesday, 13 November 2012 17:24:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:28:32 UTC