Re: ITS 2.0 terminology - canonic term from Tadej Štajner on 2013-09-27 (public-i18n-its-ig@w3.org from September 2013)

From: Tadej Štajner <tadej.stajner@ijs.si>
Date: Fri, 27 Sep 2013 16:00:34 +0200
To: Vasile Topac <vasile.topac@aut.upt.ro>, public-i18n-its-ig@w3.org
Message-ID: <52458F82.1080301@ijs.si>

Hi, Vasile, welcome to the mailing list!

Wouldn't the termInfoRef attribute work for your case? I'd say that what
is 'canonical' can get very relative very quickly, so these things are
generally solved by having term bases. Then, you can use them like this:

"... depending on the <span its-term="yes" its-term-confidence="0.8"
its-term-info-ref="http://myawesometermbase.com/symptom">symptoms</span> of the patient..."


best regards,
-- Tadej

On 26. 09. 2013 12:17, Vasile Topac wrote:
> Hi all,
>
> It's my first email here, so let me introduce my self: I'm Vasile Topac, a
> PhD student at Politehnica University of Timisoara, Romania, interested in
> increasing accessibility of textual information.
> Lately I've been using ITS 2.0 in my research and also developed an
> experimental service (available at: http://www.text4all.net/itstagger.html
> ) that tags medical terminology (using approximate matching) based on ITS
> 2.0, <chapter 8.4 Terminology>.
> Based on the experience I had with ITS 2.0 terminology tagging, I want to
> propose a new its attribute for adding the canonical form of the annotated
> term.
> Reason:
> In my research I have found that at least 50% of the medical terms
> encountered in medical natural language are in derivate form (not in the
> canonical one).
> Of course, this varies from language to language (aprox 50% in English and
> even more for Romanian).
> Here are some examples of derivated terms with ITS 2.0 annotation:
>
> "... depending on the <span its-term="yes"
> its-term-confidence="0.8">symptoms</span> of the patient..."
> "... increases the risk to become <span its-term="yes"
> its-term-confidence="0.6">diabetic</span> in the future..."
>
> So, given the rate of terms encountered in derivate form, using
> approximate terminology matching can improve the anotation a lot (of
> course it can let room for errors, but reducing that is another
> discussion, not of interest here).
> However, when it comes to consumig ITS annotated text, if the translation
> tools or NLP tools don't use approximate matching, a lot of annotation may
> be useless or confusing.
> In order to help such tools identify the exact term, I think it would be
> useful if there would be an additional attribute where the annotating
> service can set the canonical form of the term.
> To illustarate this, I'll reuse the examples above, with the new proposed
> attribute:
>
> "... depending on the <span its-term="yes" its-term-confidence="0.8"
> its-term-canonic="symptom">symptoms</span> of the patient..."
> "... increases the risk to become <span its-term="yes"
> its-term-confidence="0.6" its-term-canonic="diabetes">diabetic</span> in
> the future..."
>
> Notice the presence of its-term-canonic attribute, and the canonic form of
> the term as value.
>
> Hope this makes sense,
> Thank you,
> Vasile Topac
>
>
>

Received on Friday, 27 September 2013 14:04:15 UTC