- From: Tadej Stajner <tadej.stajner@ijs.si>
- Date: Wed, 21 Nov 2012 10:55:55 +0100
- To: Mārcis Pinnis <marcis.pinnis@Tilde.lv>
- CC: Felix Sasaki <fsasaki@w3.org>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, dave lewis <dave.lewis@cs.tcd.ie>
- Message-ID: <50ACA52B.20109@ijs.si>
Hi, Marcis, all, I can confirm that for the disambiguation example - they pre-date the inclusion of the toolsRef mechanism. The rule itself is fine - a score needs to be in the scope of a tool. -- Tadej On 11/21/2012 8:21 AM, Mārcis Pinnis wrote: > > Hi Felix, > > I assume (following the definition) that the examples (40 and 54) are > not correct. It probably has happened because the term confidence and > also the disambiguation examples have been created before agreeing on > the toolsRef attribute. > > Best regards, > > Mārcis ;o) > > *From:*Felix Sasaki [mailto:fsasaki@w3.org] > *Sent:* Wednesday, November 21, 2012 8:03 AM > *To:* public-multilingualweb-lt@w3.org; dave lewis; > tadej.stajner@ijs.si; Mārcis Pinnis > *Subject:* Question on toolsRef for disambigConfidence and > termConfidence (Re: Atb.: [action 265] data category specific > confidence scores) > > Hi all again, > > looking into the "confidence score" attributes again, I saw these > three paragraphs: > > " Any node selected by the MT Confidence > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#mtconfidence> > data category MUST > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#rfc2119> > be contained in an element with the |toolsRef| (or in HTML5, > |its-tools-ref|) attribute specified for the MT Confidence > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#mtconfidence> > data category. For more information, see Section 5.8: ITS Tools > Annotation > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation>." > > " Any node selected by the terminology data category with the > |termConfidence| attribute specified MUST > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#rfc2119> > be contained in an element with the |toolsRef| (or in HTML5 > |its-tools-ref|) attribute specified for the Terminology > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#terminology> > data category. See Section 5.8: ITS Tools Annotation > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation> > for more information." > > " Any node selected by the disambiguation > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#Disambiguation> > data category with the |disambigConfidence| attribute specified MUST > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#rfc2119> > be contained in an element with the |toolsRef| (or in HTML5 > |its-tools-ref|) attribute specified for the disambiguation > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#Disambiguation> > data category. For more information, see Section 5.8: ITS Tools > Annotation > <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation>." > > However, the examples for termConfidence (ex. 40) and > disambigConfidence (ex. 54) show no toolsRef (or its-tools-ref) > attribute. Are the examples wrong or is toolsRef / its-tools-ref > optional for termConfidence and disambigConfidence? > > Thanks, > > Felix > > Am 20.11.12 22:07, schrieb Felix Sasaki: > > Hi Dave, Marcis, Tadej, all, > > Am 14.11.12 19:27, schrieb Tadej Stajner: > > Hi, Dave, Marcis, > (see below) > > On 11/14/2012 5:47 PM, Dave Lewis wrote: > > thanks for the feedback, comment inline. > > On 13/11/2012 19:56, Mārcis Pinnis wrote: > > Hi Dave, > > 1) I support your suggestion as drafted in the attachment. > 2) Although I believe there is a typing mistake: > > <p>And he said: you need a new <quote its:term="yes" > its-info-term-ref=http://www.directron.com/motherboards1.html > <http://www.directron.com/motherboards1.html%94> > its-term-confidence=0.5>motherboard</quote></p> > > I believe its-info-term-ref should actually be its-term-info-ref?! > > > thanks for spotting that, we'll fix it. > > > This should be fixed now at > > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-terms-selector-4 > > > > > > > 3) Also, just a comment (our systems won't be affected, > but...) - why do you want to restrict the values to be from 0 > to 1? In statistics it is quite common to use also LOG-scale > probabilities (because of otherwise small numbers in some > cases). Is it necessary to restrict users to a 0 to 1 > interval? I would suggest leaving the decision up to the > user's. Also - the tools will have to be identified anyway. > This means that the users will be able to identify (if needed) > from the systems how to parse (understand) the confidence > scores. This is a general question that applies to other > confidence scores as well. > > > In general, we do not attach inter-tool significance to the > confidence scores, hence the requirement to specify the tool > using its-tools- ref. Normalising the score 0-1 is therefore > not intended to support inter-tool comparisons, but more give > the the presenting software a stable range/value to display. > > > On that note, I'd suggest explicitly adding a sentence that the > scores are comparable only in the context of the same tool. It > might be obvious to us, but it's an important point. > > > > I tried to add that to the 1st paragraph at > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation > > for terminology, mt confidence and disambiguation. > > Best, > > Felix > > > > -- Tadej > > > > > For Mt confidence score the concerned implementers suggested 0..1, > the use of log-scale didn't come up. So for no deeper reason that > consistency i'd then suggest we keep the same for term and > disambig confidence scores, unless there is a pressing reason to > do otherwise. > > cheers, > Dave > > > 4) I agree that in the current proposal it would not be reasonable > to add a confidence score as in multiple domain scenario it would > be misleading/wrong and it would require a different solution (For > instance, similar to how domains can be marked). > > > > Best regards, > Mārcis ;o) > > ________________________________________ > No: Dave Lewis [dave.lewis@cs.tcd.ie <mailto:dave.lewis@cs.tcd.ie>] > Nosūtīts: otrdiena, 2012. gada 13. novembrī 19:30 > Kam: Multilingual Web LT Public List > Tēma: Fwd: [action 265] data category specific confidence scores > > Hi all, > To try and wrap up this point: > > Summary of Discussion so far: > 1) text analytics annotation was proposed as a way of offering a > confidence score for text analytics results. As with mtconfidence > score, the tools annotaiton is now covered by the itsTool feature, > but the proposal for confidence scores remains > > 2) Marcis pointed out, using real world terminology use cases, > that we may have several annotations operating on the same > fragment, so applying a confidence score to different text > analytics annotations with a single data category won't work in > these cases because of complete override. > > Also, if we used text analytics annotation with annotation from > other data categories we are breaking our 'no dependencies between > data category rules'. > > 3) We could overcome the complete override problems using standoff > mark up as in loc quality issue and provenance. But as confidence > score would be different for each annotated fragment, that would > result in very big stand-off records, and we would still be > breaking the data cat dependencies rule. So this doesn't seem a > realistic option > > 4) so the suggestion discussed in Lyon was to drop text analytics > annotation altogether as a separate data category and focus on > adding confidence attributes to the existing data categories that > would benefit from it. > > so..... > > Proposal: > I therefore suggest the following and we need your feedback by > friday 16th Nov so we can wrap this up on the monday call! > > For those extended with confidence score (terminology, > disambiguation) please express your support and any comments by > friday - if we don't receive any we will definitely drop these > suggestions. Marcis, Tadej in particular, please consider review > these. > > For exclusions (domain, localizationQualityissue), this is your > last chance to counter-argue in favour of including, otherwise > assume these are dropped also. > > i) confidence for terminology: as suggested by Marcis > (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0028.html), > revised data category as word revisions attached (addition to > local definition, note on its-tools and example 38) > > ii) confidence for disambiguation: revised data category as word > revisions attached (addition to local definition, note on > its-tools and ex 52) > > iii) domain: I suggest excluding this as an annotation to which we > attach a confidence score. Its not clear that the use of text > analytics to identify domain, while feasible, actually represents > a real use case for interoperability mark-up. If use it would > probably be internalized by the MT engine. Also, since there are > multiple domain values the semantics of a single confidence score > is unclear. > > iv) localizationQualityIssue: i suggest also excluding this as an > annotation to which we attach confidence scores. The use of > statistical text analytics doesn't seem common for QA tasks. One > exception is the recent innovation by digital lingusitics whose > Review Sentinel product ranks translation but a TA assessment for > QA purposes - but this innovative and not current practice, so its > probably not yet a concrete use case. > > cheers, > Dave > > > > > >
Received on Wednesday, 21 November 2012 09:56:26 UTC