W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > October 2013

Re: ITS 2.0 terminology - canonic term

From: Vasile Topac <vasile.topac@aut.upt.ro>
Date: Tue, 1 Oct 2013 19:51:40 -0000
To: (wrong string) Štajner" <tadej.stajner@ijs.si>, <public-i18n-its-ig@w3.org>
Message-ID: <twig.1380657100.54212@aut.upt.ro>
Hi Tadej and all,

Thank you for your response.
termInfoRef attribute would be an option for the canonical form; however,
the problem with this atribute is that it can not present the canonical
term in a standardized mode, 
so that other ITS consumer tools can easily access it.

For example, current ITS 2.0 specifications mention this example for
"motherboard" term:
So a tool consuming this markup can not access the canonical term from the
URL nor from going to the destination (being out of ITS standard, and some
even not accessible programaticaly).

Indeed, sometimes the canonical term may be relative, although in general
for most languages/cases, it is pretty precise.
In these cases, providing the canonical form in a ITS standardised mode
might be of benefit. Especially when the annotating tool uses approximate
matching to identify terminology.

Hope this makes sense,
Thank you,

On Fri, Sep 27, 2013, Tadej Štajner <tadej.stajner@ijs.si> said:

> Hi, Vasile, welcome to the mailing list!
> Wouldn't the termInfoRef attribute work for your case? I'd say that what
> is 'canonical' can get very relative very quickly, so these things are
> generally solved by having term bases. Then, you can use them like this:
> "... depending on the <span its-term="yes" its-term-confidence="0.8"
> its-term-info-ref="http://myawesometermbase.com/symptom">symptoms</span> of the patient..."
> best regards,
> -- Tadej
> On 26. 09. 2013 12:17, Vasile Topac wrote:
>> Hi all,
>> It's my first email here, so let me introduce my self: I'm Vasile Topac, a
>> PhD student at Politehnica University of Timisoara, Romania, interested in
>> increasing accessibility of textual information.
>> Lately I've been using ITS 2.0 in my research and also developed an
>> experimental service (available at: http://www.text4all.net/itstagger.html
>> ) that tags medical terminology (using approximate matching) based on ITS
>> 2.0, <chapter 8.4 Terminology>.
>> Based on the experience I had with ITS 2.0 terminology tagging, I want to
>> propose a new its attribute for adding the canonical form of the annotated
>> term.
>> Reason:
>> In my research I have found that at least 50% of the medical terms
>> encountered in medical natural language are in derivate form (not in the
>> canonical one).
>> Of course, this varies from language to language (aprox 50% in English and
>> even more for Romanian).
>> Here are some examples of derivated terms with ITS 2.0 annotation:
>> "... depending on the <span its-term="yes"
>> its-term-confidence="0.8">symptoms</span> of the patient..."
>> "... increases the risk to become <span its-term="yes"
>> its-term-confidence="0.6">diabetic</span> in the future..."
>> So, given the rate of terms encountered in derivate form, using
>> approximate terminology matching can improve the anotation a lot (of
>> course it can let room for errors, but reducing that is another
>> discussion, not of interest here).
>> However, when it comes to consumig ITS annotated text, if the translation
>> tools or NLP tools don't use approximate matching, a lot of annotation may
>> be useless or confusing.
>> In order to help such tools identify the exact term, I think it would be
>> useful if there would be an additional attribute where the annotating
>> service can set the canonical form of the term.
>> To illustarate this, I'll reuse the examples above, with the new proposed
>> attribute:
>> "... depending on the <span its-term="yes" its-term-confidence="0.8"
>> its-term-canonic="symptom">symptoms</span> of the patient..."
>> "... increases the risk to become <span its-term="yes"
>> its-term-confidence="0.6" its-term-canonic="diabetes">diabetic</span> in
>> the future..."
>> Notice the presence of its-term-canonic attribute, and the canonic form of
>> the term as value.
>> Hope this makes sense,
>> Thank you,
>> Vasile Topac

Received on Tuesday, 1 October 2013 19:52:08 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:11:30 UTC