- From: Dave Lewis <dave.lewis@cs.tcd.ie>
- Date: Tue, 13 Nov 2012 17:28:48 +0000
- To: public-mlw-workshop@w3.org
- Message-ID: <50A28350.7050708@cs.tcd.ie>
Apologies, please ignore, I sent this to the wrong mailing list. Dave On 13/11/2012 17:20, Dave Lewis wrote: > Hi all, > To try and wrap up this point: > > _Reasoning so far:_ > 1) text analytics annotation was proposed as a way of offering a > confidence score for text analytics results. As with mtconfidence > score, the tools annotaiton is now covered by the itsTool feature, but > the proposal for confidence scores remains > > 2) Marcis pointed out, using real world terminology use cases, that we > may have several annotations operating on the same fragment, so > applying a confidence score to different text analytics annotations > with a single data category won't work in these cases because of > complete override. > > Also, if we used text analytics annotation with annotation from other > data categories we are breaking our 'no dependencies between data > category rules'. > > 3) We could overcome the complete override problems using standoff > mark up as in loc quality issue and provenance. But as confidence > score would be different for each annotated fragment, that would > result in very big stand-off records, and we would still be breaking > the data cat dependencies rule. So this doesn't seem a realistic option > > 4) so the suggestion discussed in Lyon was to drop text analytics > annotation altogether as a separate data category and focus on adding > confidence attributes to the existing data categories that would > benefit from it. > > _Proposal:_ > So i suggest the following and we need you feedback by friday 16th Nov > so we can wrap this up on the monday call! > > For those extended with confidence score (terminology, disambiguation) > please express your support and any comments by friday - if we don't > receive any we will definitely drop these suggestions. Marcis, Tadej > in particular, please consider review these. > > For exclusions (domain, localizationQualityissue), this is your last > chance to counter-argue in favour of including, otherwise assume these > are dropped also. > > i) confidence for terminology: as suggested by Marcis > (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0028.html), > revised data category as word revisions attached (addition to local > definition, note on its-tools and example 38) > > ii) confidence for disambiguation: revised data category as word > revisions attached (addition to local definition, note on its-tools > and ex 52) > > iii) domain: I suggest _excluding _this as an annotation to which we > attach a confidence score. Its not clear that the use of text > analytics to identify domain, while feasible, actually represents a > real use case for interoperability mark-up. If use it would probably > be internalized by the MT engine. Also, since there are multiple > domain values the semantics of a single confidence score is unclear. > > iv) localizationQualityIssue: i suggest also excluding this as an > annotation to which we attach confidence scores. The use of > statistical text analytics doesn't seem common for QA tasks. One > exception is the recent innovation by digital lingusitics whose Review > Sentinel product ranks translation but a TA assessment for QA purposes > - but this innovative and not current practice, so its probably not > yet a concrete use case. > > cheers, > Dave > >
Received on Tuesday, 13 November 2012 17:24:07 UTC