- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 21 Nov 2012 10:51:11 +0100
- To: Mârcis Pinnis <marcis.pinnis@Tilde.lv>
- CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, dave lewis <dave.lewis@cs.tcd.ie>, "tadej.stajner@ijs.si" <tadej.stajner@ijs.si>
- Message-ID: <50ACA40F.9070501@w3.org>
Hi Marcis, all, Am 21.11.12 09:23, schrieb Mârcis Pinnis: > > Hi Felix, > > If the base is xs:decimal, then - yes. > > The xs:decimal requires that the decimal separator is a point and not > a comma as used in some (including Latvian) locales... If that is not > explicitly required, users could (mis)use locale specific formats, but > that would cause compatibility problems. So ... (in my opinion) it is > important that the format is a restricted (inclusive interval of > [0;1]) variant of “xs:decimal”. > Thanks - I added this sentence to the definition of disambigConfidence, mtConfidence and termConfidence: "The value follows the XML Schema decimal data type with the constraining facets minInclusive set to 0 and maxInclusive set to 1." Hope that this will be OK, Felix > Best regards, > > Mârcis ;o) > > *From:*Felix Sasaki [mailto:fsasaki@w3.org] > *Sent:* Wednesday, November 21, 2012 10:15 AM > *To:* Mârcis Pinnis > *Cc:* public-multilingualweb-lt@w3.org; dave lewis; tadej.stajner@ijs.si > *Subject:* Re: Atb.: [action 265] data category specific confidence scores > > Thanks, Marcis. Just to be sure: you are saying that the mtConfidence, > disambigConfidence and termConfidence should follow a definition like > this: > > <simpleType name='confidence'> > <restriction base='decimal'> > <minInclusive value='0'/> > <maxInclusive value='1'/> > </restriction> > </simpleType> > > > Best, > > Felix > > Am 21.11.12 08:55, schrieb Mârcis Pinnis: > > Hi Felix, > > > > I went over the wording, there is only one thing that I find misleading and not well disambiguated. > > > > The following are all valid mathematical representations of rational numbers: 0.3; 1/3; 0.(3) ... and with locale specifics: 0,3; 0,(3) > > > > If the format specifies "a rational number in the interval 0 to 1" then all these I would assume valid (as a mathematician). Maybe the wording should be changed to: "a finite decimal number" (this would also hardcode the number base...). Maybe it is irrelevant, maybe not, but if there is already a limitation of 0 to 1, then I think there should be also a clear indication to the base and format (maybe it is possible to borrow the definition of xs:decimalhttp://www.w3.org/TR/xmlschema-2/#decimal). > > > > Best regards, > > Mârcis ;o) > > > > > > -----Original Message----- > > From: Felix Sasaki [mailto:fsasaki@w3.org] > > Sent: Tuesday, November 20, 2012 11:07 PM > > To:public-multilingualweb-lt@w3.org <mailto:public-multilingualweb-lt@w3.org>; dave lewis;tadej.stajner@ijs.si <mailto:tadej.stajner@ijs.si>; Mârcis Pinnis > > Subject: Re: Atb.: [action 265] data category specific confidence scores > > > > Hi Dave, Marcis, Tadej, all, > > > > Am 14.11.12 19:27, schrieb Tadej Stajner: > > Hi, Dave, Marcis, > > (see below) > > > > On 11/14/2012 5:47 PM, Dave Lewis wrote: > > thanks for the feedback, comment inline. > > > > On 13/11/2012 19:56, Mârcis Pinnis wrote: > > Hi Dave, > > > > 1) I support your suggestion as drafted in the attachment. > > 2) Although I believe there is a typing mistake: > > > > <p>And he said: you need a new <quote its:term="yes" > > its-info-term-ref=”http://www.directron.com/motherboards1.html” <http://www.directron.com/motherboards1.html%94> > > its-term-confidence=”0.5”>motherboard</quote></p> > > > > I believe its-info-term-ref should actually be its-term-info-ref?! > > > > thanks for spotting that, we'll fix it. > > > > This should be fixed now at > > > > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-terms-selector-4 > > > > > > > > 3) Also, just a comment (our systems won't be affected, but...) - > > why do you want to restrict the values to be from 0 to 1? In > > statistics it is quite common to use also LOG-scale probabilities > > (because of otherwise small numbers in some cases). Is it necessary > > to restrict users to a 0 to 1 interval? I would suggest leaving the > > decision up to the user's. Also - the tools will have to be > > identified anyway. This means that the users will be able to > > identify (if needed) from the systems how to parse (understand) the > > confidence scores. This is a general question that applies to other > > confidence scores as well. > > > > In general, we do not attach inter-tool significance to the > > confidence scores, hence the requirement to specify the tool using > > its-tools- ref. Normalising the score 0-1 is therefore not intended > > to support inter-tool comparisons, but more give the the presenting > > software a stable range/value to display. > > > > On that note, I'd suggest explicitly adding a sentence that the scores > > are comparable only in the context of the same tool. It might be > > obvious to us, but it's an important point. > > > > > > I tried to add that to the 1st paragraph athttp://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation > > for terminology, mt confidence and disambiguation. > > > > Best, > > > > Felix > > > > > > -- Tadej > > > > > > > > For Mt confidence score the concerned implementers suggested 0..1, > > the use of log-scale didn't come up. So for no deeper reason that > > consistency i'd then suggest we keep the same for term and disambig > > confidence scores, unless there is a pressing reason to do otherwise. > > > > cheers, > > Dave > > > > 4) I agree that in the current proposal it would not be reasonable > > to add a confidence score as in multiple domain scenario it would be > > misleading/wrong and it would require a different solution (For > > instance, similar to how domains can be marked). > > > > Best regards, > > Mârcis ;o) > > > > ________________________________________ > > No: Dave Lewis [dave.lewis@cs.tcd.ie <mailto:dave.lewis@cs.tcd.ie>] > > Nosűtîts: otrdiena, 2012. gada 13. novembrî 19:30 > > Kam: Multilingual Web LT Public List > > Tçma: Fwd: [action 265] data category specific confidence scores > > > > Hi all, > > To try and wrap up this point: > > > > Summary of Discussion so far: > > 1) text analytics annotation was proposed as a way of offering a > > confidence score for text analytics results. As with mtconfidence > > score, the tools annotaiton is now covered by the itsTool feature, > > but the proposal for confidence scores remains > > > > 2) Marcis pointed out, using real world terminology use cases, that > > we may have several annotations operating on the same fragment, so > > applying a confidence score to different text analytics annotations > > with a single data category won't work in these cases because of > > complete override. > > > > Also, if we used text analytics annotation with annotation from > > other data categories we are breaking our 'no dependencies between > > data category rules'. > > > > 3) We could overcome the complete override problems using standoff > > mark up as in loc quality issue and provenance. But as confidence > > score would be different for each annotated fragment, that would > > result in very big stand-off records, and we would still be breaking > > the data cat dependencies rule. So this doesn't seem a realistic > > option > > > > 4) so the suggestion discussed in Lyon was to drop text analytics > > annotation altogether as a separate data category and focus on > > adding confidence attributes to the existing data categories that > > would benefit from it. > > > > so..... > > > > Proposal: > > I therefore suggest the following and we need your feedback by > > friday 16th Nov so we can wrap this up on the monday call! > > > > For those extended with confidence score (terminology, > > disambiguation) please express your support and any comments by > > friday - if we don't receive any we will definitely drop these > > suggestions. Marcis, Tadej in particular, please consider review these. > > > > For exclusions (domain, localizationQualityissue), this is your last > > chance to counter-argue in favour of including, otherwise assume > > these are dropped also. > > > > i) confidence for terminology: as suggested by Marcis > > (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012N > > ov/0028.html), revised data category as word revisions attached > > (addition to local definition, note on its-tools and example 38) > > > > ii) confidence for disambiguation: revised data category as word > > revisions attached (addition to local definition, note on its-tools > > and ex 52) > > > > iii) domain: I suggest excluding this as an annotation to which we > > attach a confidence score. Its not clear that the use of text > > analytics to identify domain, while feasible, actually represents a > > real use case for interoperability mark-up. If use it would probably > > be internalized by the MT engine. Also, since there are multiple > > domain values the semantics of a single confidence score is unclear. > > > > iv) localizationQualityIssue: i suggest also excluding this as an > > annotation to which we attach confidence scores. The use of > > statistical text analytics doesn't seem common for QA tasks. One > > exception is the recent innovation by digital lingusitics whose > > Review Sentinel product ranks translation but a TA assessment for QA > > purposes - but this innovative and not current practice, so its > > probably not yet a concrete use case. > > > > cheers, > > Dave > > > > > > > > > > > > > > > > > > > > > > > > >
Received on Wednesday, 21 November 2012 09:51:34 UTC