W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > October 2013

Re: ITS 2.0 terminology - canonic term

From: Felix Sasaki <fsasaki@w3.org>
Date: Mon, 14 Oct 2013 09:33:53 +0200
Message-ID: <525B9E61.3030907@w3.org>
To: Vasile Topac <vasile.topac@aut.upt.ro>
CC: Tadej Štajner <tadej.stajner@ijs.si>, public-i18n-its-ig@w3.org
Hi Vasile, all,

what you say makes sense and it is both a shortcoming and a strength of 
ITS: the shortcoming is that by accessing the termInfoRef does not 
provide interoperability. The strength is that tools are free to add 
whatever references they want.

I am not sure if references to terminological information is already a 
common use case and whether potential users would agree on what to find 
on the end of the URI. But it would be interesting to hear people's 



Am 01.10.13 21:51, schrieb Vasile Topac:
> Hi Tadej and all,
> Thank you for your response.
> termInfoRef attribute would be an option for the canonical form; however,
> the problem with this atribute is that it can not present the canonical
> term in a standardized mode,
> so that other ITS consumer tools can easily access it.
> For example, current ITS 2.0 specifications mention this example for
> "motherboard" term:
> its:termInfoRef="http://www.directron.com/motherboards1.html"
> So a tool consuming this markup can not access the canonical term from the
> URL nor from going to the destination (being out of ITS standard, and some
> even not accessible programaticaly).
> Indeed, sometimes the canonical term may be relative, although in general
> for most languages/cases, it is pretty precise.
> In these cases, providing the canonical form in a ITS standardised mode
> might be of benefit. Especially when the annotating tool uses approximate
> matching to identify terminology.
> Hope this makes sense,
> Thank you,
> Vasile
> On Fri, Sep 27, 2013, Tadej Ć tajner <tadej.stajner@ijs.si> said:
>> Hi, Vasile, welcome to the mailing list!
>> Wouldn't the termInfoRef attribute work for your case? I'd say that what
>> is 'canonical' can get very relative very quickly, so these things are
>> generally solved by having term bases. Then, you can use them like this:
>> "... depending on the <span its-term="yes" its-term-confidence="0.8"
>> its-term-info-ref="http://myawesometermbase.com/symptom">symptoms</span> of the patient..."
>> best regards,
>> -- Tadej
>> On 26. 09. 2013 12:17, Vasile Topac wrote:
>>> Hi all,
>>> It's my first email here, so let me introduce my self: I'm Vasile Topac, a
>>> PhD student at Politehnica University of Timisoara, Romania, interested in
>>> increasing accessibility of textual information.
>>> Lately I've been using ITS 2.0 in my research and also developed an
>>> experimental service (available at: http://www.text4all.net/itstagger.html
>>> ) that tags medical terminology (using approximate matching) based on ITS
>>> 2.0, <chapter 8.4 Terminology>.
>>> Based on the experience I had with ITS 2.0 terminology tagging, I want to
>>> propose a new its attribute for adding the canonical form of the annotated
>>> term.
>>> Reason:
>>> In my research I have found that at least 50% of the medical terms
>>> encountered in medical natural language are in derivate form (not in the
>>> canonical one).
>>> Of course, this varies from language to language (aprox 50% in English and
>>> even more for Romanian).
>>> Here are some examples of derivated terms with ITS 2.0 annotation:
>>> "... depending on the <span its-term="yes"
>>> its-term-confidence="0.8">symptoms</span> of the patient..."
>>> "... increases the risk to become <span its-term="yes"
>>> its-term-confidence="0.6">diabetic</span> in the future..."
>>> So, given the rate of terms encountered in derivate form, using
>>> approximate terminology matching can improve the anotation a lot (of
>>> course it can let room for errors, but reducing that is another
>>> discussion, not of interest here).
>>> However, when it comes to consumig ITS annotated text, if the translation
>>> tools or NLP tools don't use approximate matching, a lot of annotation may
>>> be useless or confusing.
>>> In order to help such tools identify the exact term, I think it would be
>>> useful if there would be an additional attribute where the annotating
>>> service can set the canonical form of the term.
>>> To illustarate this, I'll reuse the examples above, with the new proposed
>>> attribute:
>>> "... depending on the <span its-term="yes" its-term-confidence="0.8"
>>> its-term-canonic="symptom">symptoms</span> of the patient..."
>>> "... increases the risk to become <span its-term="yes"
>>> its-term-confidence="0.6" its-term-canonic="diabetes">diabetic</span> in
>>> the future..."
>>> Notice the presence of its-term-canonic attribute, and the canonic form of
>>> the term as value.
>>> Hope this makes sense,
>>> Thank you,
>>> Vasile Topac
Received on Monday, 14 October 2013 07:34:27 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:11:30 UTC