W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > November 2012

RE: Atb.: [action 265] data category specific confidence scores

From: Mârcis Pinnis <marcis.pinnis@Tilde.lv>
Date: Wed, 21 Nov 2012 10:23:52 +0200
To: Felix Sasaki <fsasaki@w3.org>
CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, dave lewis <dave.lewis@cs.tcd.ie>, "tadej.stajner@ijs.si" <tadej.stajner@ijs.si>
Message-ID: <AC6FD4BB9BB02540AC7322091A6C3B5472B025A62D@postal.Tilde.lv>
Hi Felix,

If the base is xs:decimal, then - yes.

The xs:decimal requires that the decimal separator is a point and not a comma as used in some (including Latvian) locales... If that is not explicitly required, users could (mis)use locale specific formats, but that would cause compatibility problems. So ... (in my opinion) it is important that the format is a restricted (inclusive interval of [0;1]) variant of “xs:decimal”.

Best regards,
Mârcis ;o)

From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Wednesday, November 21, 2012 10:15 AM
To: Mârcis Pinnis
Cc: public-multilingualweb-lt@w3.org; dave lewis; tadej.stajner@ijs.si
Subject: Re: Atb.: [action 265] data category specific confidence scores

Thanks, Marcis. Just to be sure: you are saying that the mtConfidence, disambigConfidence and termConfidence should follow a definition like this:

<simpleType name='confidence'>

  <restriction base='decimal'>

    <minInclusive value='0'/>

    <maxInclusive value='1'/>





Am 21.11.12 08:55, schrieb Mârcis Pinnis:

Hi Felix,

I went over the wording, there is only one thing that I find misleading and not well disambiguated.

The following are all valid mathematical representations of rational numbers: 0.3; 1/3; 0.(3) ... and with locale specifics: 0,3; 0,(3)

If the format specifies "a rational number in the interval 0 to 1" then all these I would assume valid (as a mathematician). Maybe the wording should be changed to: "a finite decimal number" (this would also hardcode the number base...). Maybe it is irrelevant, maybe not, but if there is already a limitation of 0 to 1, then I think there should be also a clear indication to the base and format (maybe it is possible to borrow the definition of xs:decimal http://www.w3.org/TR/xmlschema-2/#decimal).

Best regards,

Mârcis ;o)

-----Original Message-----

From: Felix Sasaki [mailto:fsasaki@w3.org]

Sent: Tuesday, November 20, 2012 11:07 PM

To: public-multilingualweb-lt@w3.org<mailto:public-multilingualweb-lt@w3.org>; dave lewis; tadej.stajner@ijs.si<mailto:tadej.stajner@ijs.si>; Mârcis Pinnis

Subject: Re: Atb.: [action 265] data category specific confidence scores

Hi Dave, Marcis, Tadej, all,

Am 14.11.12 19:27, schrieb Tadej Stajner:

Hi, Dave, Marcis,

(see below)

On 11/14/2012 5:47 PM, Dave Lewis wrote:

thanks for the feedback, comment inline.

On 13/11/2012 19:56, Mârcis Pinnis wrote:

Hi Dave,

1) I support your suggestion as drafted in the attachment.

2) Although I believe there is a typing mistake:

<p>And he said: you need a new <quote its:term="yes"



I believe its-info-term-ref should actually be its-term-info-ref?!

thanks for spotting that, we'll fix it.

This should be fixed now at


3) Also, just a comment (our systems won't be affected, but...) -

why do you want to restrict the values to be from 0 to 1? In

statistics it is quite common to use also LOG-scale probabilities

(because of otherwise small numbers in some cases). Is it necessary

to restrict users to a 0 to 1 interval? I would suggest leaving the

decision up to the user's. Also - the tools will have to be

identified anyway. This means that the users will be able to

identify (if needed) from the systems how to parse (understand) the

confidence scores. This is a general question that applies to other

confidence scores as well.

In general, we do not attach inter-tool significance to the

confidence scores, hence the requirement to specify the tool using

its-tools- ref. Normalising the score 0-1 is therefore not intended

to support inter-tool comparisons, but more give the the presenting

software a stable range/value to display.

On that note, I'd suggest explicitly adding a sentence that the scores

are comparable only in the context of the same tool. It might be

obvious to us, but it's an important point.

I tried to add that to the 1st paragraph at http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-tool-annotation

for terminology, mt confidence and disambiguation.



-- Tadej

For Mt confidence score the concerned implementers suggested 0..1,

the use of log-scale didn't come up. So for no deeper reason that

consistency i'd then suggest we keep the same for term and disambig

confidence scores, unless there is a pressing reason to do otherwise.



4) I agree that in the current proposal it would not be reasonable

to add a confidence score as in multiple domain scenario it would be

misleading/wrong and it would require a different solution (For

instance, similar to how domains can be marked).

Best regards,

Mârcis ;o)


No: Dave Lewis [dave.lewis@cs.tcd.ie<mailto:dave.lewis@cs.tcd.ie>]

Nosűtîts: otrdiena, 2012. gada 13. novembrî 19:30

Kam: Multilingual Web LT Public List

Tçma: Fwd: [action 265] data category specific confidence scores

Hi all,

To try and wrap up this point:

Summary  of Discussion so far:

1) text analytics annotation was proposed as a way of offering a

confidence score for text analytics results. As with mtconfidence

score, the tools annotaiton is now covered by the itsTool feature,

but the proposal for confidence scores remains

2) Marcis pointed out, using real world terminology use cases, that

we may have several annotations operating on the same fragment, so

applying a confidence score to different text analytics annotations

with a single data category won't work in these cases because of

complete override.

Also, if we used text analytics annotation with annotation from

other data categories we are breaking our 'no dependencies between

data category rules'.

3) We could overcome the complete override problems using standoff

mark up as in loc quality issue and provenance. But as confidence

score would be different for each annotated fragment, that would

result in very big stand-off records, and we would still be breaking

the data cat dependencies rule. So this doesn't seem a realistic


4) so the suggestion discussed in Lyon was to drop  text analytics

annotation altogether as a separate data category and focus on

adding confidence attributes to the existing data categories that

would benefit from it.



I therefore suggest the following and we need your feedback by

friday 16th Nov so we can wrap this up on the monday call!

For those extended with confidence score (terminology,

disambiguation) please express your support and any comments by

friday - if we don't receive any we will definitely drop these

suggestions. Marcis, Tadej in particular, please consider review these.

For exclusions (domain, localizationQualityissue), this is your last

chance to counter-argue in favour of including, otherwise assume

these are dropped also.

i) confidence for terminology: as suggested by Marcis


ov/0028.html), revised data category as word revisions attached

(addition to local definition, note on its-tools and example  38)

ii) confidence for disambiguation: revised data category as word

revisions attached (addition to local definition, note on its-tools

and ex 52)

iii) domain: I suggest excluding this as an annotation to which we

attach a confidence score. Its not clear that the use of text

analytics to identify domain, while feasible, actually represents a

real use case for interoperability mark-up. If use it would probably

be internalized by the MT engine. Also, since there are multiple

domain values the semantics of a single confidence score is unclear.

iv) localizationQualityIssue: i suggest also excluding this as an

annotation to which we attach confidence scores. The use of

statistical text analytics doesn't seem common for QA tasks. One

exception is the recent innovation by digital lingusitics whose

Review Sentinel product ranks translation but a TA assessment for QA

purposes - but this innovative and not current practice, so its

probably not yet a concrete use case.


Received on Wednesday, 21 November 2012 08:24:22 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:08:24 UTC