- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 21 Nov 2012 11:04:48 +0100
- To: public-multilingualweb-lt@w3.org
- Message-ID: <50ACA740.4050706@w3.org>
Thanks, Tadej and Marci - I updated the examples. - Felix Am 21.11.12 10:55, schrieb Tadej Stajner: > Hi, Marcis, all, > I can confirm that for the disambiguation example - they pre-date the > inclusion of the toolsRef mechanism. The rule itself is fine - a score > needs to be in the scope of a tool. > -- Tadej > > On 11/21/2012 8:21 AM, Mārcis Pinnis wrote: >> >> Hi Felix, >> >> I assume (following the definition) that the examples (40 and 54) are >> not correct. It probably has happened because the term confidence and >> also the disambiguation examples have been created before agreeing on >> the toolsRef attribute. >> >> Best regards, >> >> Mārcis ;o) >> >> *From:*Felix Sasaki [mailto:fsasaki@w3.org] >> *Sent:* Wednesday, November 21, 2012 8:03 AM >> *To:* public-multilingualweb-lt@w3.org; dave lewis; >> tadej.stajner@ijs.si; Mārcis Pinnis >> *Subject:* Question on toolsRef for disambigConfidence and >> termConfidence (Re: Atb.: [action 265] data category specific >> confidence scores) >> >> Hi all again, >> >> looking into the "confidence score" attributes again, I saw these >> three paragraphs: >> >> " Any node selected by the MT Confidence >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#mtconfidence> >> data category MUST >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#rfc2119> >> be contained in an element with the |toolsRef| (or in HTML5, >> |its-tools-ref|) attribute specified for the MT Confidence >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#mtconfidence> >> data category. For more information, see Section 5.8: ITS Tools >> Annotation >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation>." >> >> " Any node selected by the terminology data category with the >> |termConfidence| attribute specified MUST >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#rfc2119> >> be contained in an element with the |toolsRef| (or in HTML5 >> |its-tools-ref|) attribute specified for the Terminology >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#terminology> >> data category. See Section 5.8: ITS Tools Annotation >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation> >> for more information." >> >> " Any node selected by the disambiguation >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#Disambiguation> >> data category with the |disambigConfidence| attribute specified MUST >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#rfc2119> >> be contained in an element with the |toolsRef| (or in HTML5 >> |its-tools-ref|) attribute specified for the disambiguation >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#Disambiguation> >> data category. For more information, see Section 5.8: ITS Tools >> Annotation >> <http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation>." >> >> However, the examples for termConfidence (ex. 40) and >> disambigConfidence (ex. 54) show no toolsRef (or its-tools-ref) >> attribute. Are the examples wrong or is toolsRef / its-tools-ref >> optional for termConfidence and disambigConfidence? >> >> Thanks, >> >> Felix >> >> Am 20.11.12 22:07, schrieb Felix Sasaki: >> >> Hi Dave, Marcis, Tadej, all, >> >> Am 14.11.12 19:27, schrieb Tadej Stajner: >> >> Hi, Dave, Marcis, >> (see below) >> >> On 11/14/2012 5:47 PM, Dave Lewis wrote: >> >> thanks for the feedback, comment inline. >> >> On 13/11/2012 19:56, Mārcis Pinnis wrote: >> >> Hi Dave, >> >> 1) I support your suggestion as drafted in the attachment. >> 2) Although I believe there is a typing mistake: >> >> <p>And he said: you need a new <quote its:term="yes" >> its-info-term-ref=http://www.directron.com/motherboards1.html >> <http://www.directron.com/motherboards1.html%94> >> its-term-confidence=0.5>motherboard</quote></p> >> >> I believe its-info-term-ref should actually be its-term-info-ref?! >> >> >> thanks for spotting that, we'll fix it. >> >> >> This should be fixed now at >> >> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-terms-selector-4 >> >> >> >> >> >> >> 3) Also, just a comment (our systems won't be affected, >> but...) - why do you want to restrict the values to be from 0 >> to 1? In statistics it is quite common to use also LOG-scale >> probabilities (because of otherwise small numbers in some >> cases). Is it necessary to restrict users to a 0 to 1 >> interval? I would suggest leaving the decision up to the >> user's. Also - the tools will have to be identified anyway. >> This means that the users will be able to identify (if >> needed) from the systems how to parse (understand) the >> confidence scores. This is a general question that applies to >> other confidence scores as well. >> >> >> In general, we do not attach inter-tool significance to the >> confidence scores, hence the requirement to specify the tool >> using its-tools- ref. Normalising the score 0-1 is therefore >> not intended to support inter-tool comparisons, but more give >> the the presenting software a stable range/value to display. >> >> >> On that note, I'd suggest explicitly adding a sentence that the >> scores are comparable only in the context of the same tool. It >> might be obvious to us, but it's an important point. >> >> >> >> I tried to add that to the 1st paragraph at >> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation >> >> for terminology, mt confidence and disambiguation. >> >> Best, >> >> Felix >> >> >> >> -- Tadej >> >> >> >> >> For Mt confidence score the concerned implementers suggested >> 0..1, the use of log-scale didn't come up. So for no deeper >> reason that consistency i'd then suggest we keep the same for >> term and disambig confidence scores, unless there is a pressing >> reason to do otherwise. >> >> cheers, >> Dave >> >> >> 4) I agree that in the current proposal it would not be >> reasonable to add a confidence score as in multiple domain >> scenario it would be misleading/wrong and it would require a >> different solution (For instance, similar to how domains can be >> marked). >> >> >> >> Best regards, >> Mārcis ;o) >> >> ________________________________________ >> No: Dave Lewis [dave.lewis@cs.tcd.ie <mailto:dave.lewis@cs.tcd.ie>] >> Nosūtīts: otrdiena, 2012. gada 13. novembrī 19:30 >> Kam: Multilingual Web LT Public List >> Tēma: Fwd: [action 265] data category specific confidence scores >> >> Hi all, >> To try and wrap up this point: >> >> Summary of Discussion so far: >> 1) text analytics annotation was proposed as a way of offering a >> confidence score for text analytics results. As with mtconfidence >> score, the tools annotaiton is now covered by the itsTool >> feature, but the proposal for confidence scores remains >> >> 2) Marcis pointed out, using real world terminology use cases, >> that we may have several annotations operating on the same >> fragment, so applying a confidence score to different text >> analytics annotations with a single data category won't work in >> these cases because of complete override. >> >> Also, if we used text analytics annotation with annotation from >> other data categories we are breaking our 'no dependencies >> between data category rules'. >> >> 3) We could overcome the complete override problems using >> standoff mark up as in loc quality issue and provenance. But as >> confidence score would be different for each annotated fragment, >> that would result in very big stand-off records, and we would >> still be breaking the data cat dependencies rule. So this doesn't >> seem a realistic option >> >> 4) so the suggestion discussed in Lyon was to drop text >> analytics annotation altogether as a separate data category and >> focus on adding confidence attributes to the existing data >> categories that would benefit from it. >> >> so..... >> >> Proposal: >> I therefore suggest the following and we need your feedback by >> friday 16th Nov so we can wrap this up on the monday call! >> >> For those extended with confidence score (terminology, >> disambiguation) please express your support and any comments by >> friday - if we don't receive any we will definitely drop these >> suggestions. Marcis, Tadej in particular, please consider review >> these. >> >> For exclusions (domain, localizationQualityissue), this is your >> last chance to counter-argue in favour of including, otherwise >> assume these are dropped also. >> >> i) confidence for terminology: as suggested by Marcis >> (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0028.html), >> revised data category as word revisions attached (addition to >> local definition, note on its-tools and example 38) >> >> ii) confidence for disambiguation: revised data category as word >> revisions attached (addition to local definition, note on >> its-tools and ex 52) >> >> iii) domain: I suggest excluding this as an annotation to which >> we attach a confidence score. Its not clear that the use of text >> analytics to identify domain, while feasible, actually represents >> a real use case for interoperability mark-up. If use it would >> probably be internalized by the MT engine. Also, since there are >> multiple domain values the semantics of a single confidence score >> is unclear. >> >> iv) localizationQualityIssue: i suggest also excluding this as an >> annotation to which we attach confidence scores. The use of >> statistical text analytics doesn't seem common for QA tasks. One >> exception is the recent innovation by digital lingusitics whose >> Review Sentinel product ranks translation but a TA assessment for >> QA purposes - but this innovative and not current practice, so >> its probably not yet a concrete use case. >> >> cheers, >> Dave >> >> >> >> >> >> >
Received on Wednesday, 21 November 2012 10:05:09 UTC