- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 20 Nov 2012 22:07:08 +0100
- To: public-multilingualweb-lt@w3.org, dave lewis <dave.lewis@cs.tcd.ie>, tadej.stajner@ijs.si, marcis.pinnis@Tilde.lv
Hi Dave, Marcis, Tadej, all, Am 14.11.12 19:27, schrieb Tadej Stajner: > Hi, Dave, Marcis, > (see below) > > On 11/14/2012 5:47 PM, Dave Lewis wrote: >> thanks for the feedback, comment inline. >> >> On 13/11/2012 19:56, Mārcis Pinnis wrote: >>> Hi Dave, >>> >>> 1) I support your suggestion as drafted in the attachment. >>> 2) Although I believe there is a typing mistake: >>> >>> <p>And he said: you need a new <quote its:term="yes" >>> its-info-term-ref=http://www.directron.com/motherboards1.html >>> its-term-confidence=0.5>motherboard</quote></p> >>> >>> I believe its-info-term-ref should actually be its-term-info-ref?! >> >> thanks for spotting that, we'll fix it. This should be fixed now at http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-terms-selector-4 >> >>> >>> 3) Also, just a comment (our systems won't be affected, but...) - >>> why do you want to restrict the values to be from 0 to 1? In >>> statistics it is quite common to use also LOG-scale probabilities >>> (because of otherwise small numbers in some cases). Is it necessary >>> to restrict users to a 0 to 1 interval? I would suggest leaving the >>> decision up to the user's. Also - the tools will have to be >>> identified anyway. This means that the users will be able to >>> identify (if needed) from the systems how to parse (understand) the >>> confidence scores. This is a general question that applies to other >>> confidence scores as well. >> >> In general, we do not attach inter-tool significance to the >> confidence scores, hence the requirement to specify the tool using >> its-tools- ref. Normalising the score 0-1 is therefore not intended >> to support inter-tool comparisons, but more give the the presenting >> software a stable range/value to display. > > On that note, I'd suggest explicitly adding a sentence that the scores > are comparable only in the context of the same tool. It might be > obvious to us, but it's an important point. I tried to add that to the 1st paragraph at http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation for terminology, mt confidence and disambiguation. Best, Felix > > -- Tadej > > >> >> For Mt confidence score the concerned implementers suggested 0..1, >> the use of log-scale didn't come up. So for no deeper reason that >> consistency i'd then suggest we keep the same for term and disambig >> confidence scores, unless there is a pressing reason to do otherwise. >> >> cheers, >> Dave >> >>> 4) I agree that in the current proposal it would not be reasonable >>> to add a confidence score as in multiple domain scenario it would be >>> misleading/wrong and it would require a different solution (For >>> instance, similar to how domains can be marked). >> >>> Best regards, >>> Mārcis ;o) >>> >>> ________________________________________ >>> No: Dave Lewis [dave.lewis@cs.tcd.ie] >>> Nosūtīts: otrdiena, 2012. gada 13. novembrī 19:30 >>> Kam: Multilingual Web LT Public List >>> Tēma: Fwd: [action 265] data category specific confidence scores >>> >>> Hi all, >>> To try and wrap up this point: >>> >>> Summary of Discussion so far: >>> 1) text analytics annotation was proposed as a way of offering a >>> confidence score for text analytics results. As with mtconfidence >>> score, the tools annotaiton is now covered by the itsTool feature, >>> but the proposal for confidence scores remains >>> >>> 2) Marcis pointed out, using real world terminology use cases, that >>> we may have several annotations operating on the same fragment, so >>> applying a confidence score to different text analytics annotations >>> with a single data category won't work in these cases because of >>> complete override. >>> >>> Also, if we used text analytics annotation with annotation from >>> other data categories we are breaking our 'no dependencies between >>> data category rules'. >>> >>> 3) We could overcome the complete override problems using standoff >>> mark up as in loc quality issue and provenance. But as confidence >>> score would be different for each annotated fragment, that would >>> result in very big stand-off records, and we would still be breaking >>> the data cat dependencies rule. So this doesn't seem a realistic option >>> >>> 4) so the suggestion discussed in Lyon was to drop text analytics >>> annotation altogether as a separate data category and focus on >>> adding confidence attributes to the existing data categories that >>> would benefit from it. >>> >>> so..... >>> >>> Proposal: >>> I therefore suggest the following and we need your feedback by >>> friday 16th Nov so we can wrap this up on the monday call! >>> >>> For those extended with confidence score (terminology, >>> disambiguation) please express your support and any comments by >>> friday - if we don't receive any we will definitely drop these >>> suggestions. Marcis, Tadej in particular, please consider review these. >>> >>> For exclusions (domain, localizationQualityissue), this is your last >>> chance to counter-argue in favour of including, otherwise assume >>> these are dropped also. >>> >>> i) confidence for terminology: as suggested by Marcis >>> (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0028.html), >>> revised data category as word revisions attached (addition to local >>> definition, note on its-tools and example 38) >>> >>> ii) confidence for disambiguation: revised data category as word >>> revisions attached (addition to local definition, note on its-tools >>> and ex 52) >>> >>> iii) domain: I suggest excluding this as an annotation to which we >>> attach a confidence score. Its not clear that the use of text >>> analytics to identify domain, while feasible, actually represents a >>> real use case for interoperability mark-up. If use it would probably >>> be internalized by the MT engine. Also, since there are multiple >>> domain values the semantics of a single confidence score is unclear. >>> >>> iv) localizationQualityIssue: i suggest also excluding this as an >>> annotation to which we attach confidence scores. The use of >>> statistical text analytics doesn't seem common for QA tasks. One >>> exception is the recent innovation by digital lingusitics whose >>> Review Sentinel product ranks translation but a TA assessment for QA >>> purposes - but this innovative and not current practice, so its >>> probably not yet a concrete use case. >>> >>> cheers, >>> Dave >>> >>> >>> >>> >>> >> >> > > >
Received on Tuesday, 20 November 2012 21:07:36 UTC