ITS 2.0 terminology - canonic term from Vasile Topac on 2013-09-26 (public-i18n-its-ig@w3.org from September 2013)

From: Vasile Topac <vasile.topac@aut.upt.ro>
Date: Thu, 26 Sep 2013 10:17:04 -0000
To: <public-i18n-its-ig@w3.org>
Message-ID: <twig.1380190624.821@aut.upt.ro>

Hi all,

It's my first email here, so let me introduce my self: I'm Vasile Topac, a
PhD student at Politehnica University of Timisoara, Romania, interested in
increasing accessibility of textual information.
Lately I've been using ITS 2.0 in my research and also developed an
experimental service (available at: http://www.text4all.net/itstagger.html
) that tags medical terminology (using approximate matching) based on ITS
2.0, <chapter 8.4 Terminology>.
Based on the experience I had with ITS 2.0 terminology tagging, I want to
propose a new its attribute for adding the canonical form of the annotated
term.
Reason:
In my research I have found that at least 50% of the medical terms
encountered in medical natural language are in derivate form (not in the
canonical one).
Of course, this varies from language to language (aprox 50% in English and
even more for Romanian).
Here are some examples of derivated terms with ITS 2.0 annotation:

"... depending on the <span its-term="yes"
its-term-confidence="0.8">symptoms</span> of the patient..."
"... increases the risk to become <span its-term="yes"
its-term-confidence="0.6">diabetic</span> in the future..."

So, given the rate of terms encountered in derivate form, using
approximate terminology matching can improve the anotation a lot (of
course it can let room for errors, but reducing that is another
discussion, not of interest here).
However, when it comes to consumig ITS annotated text, if the translation
tools or NLP tools don't use approximate matching, a lot of annotation may
be useless or confusing.
In order to help such tools identify the exact term, I think it would be
useful if there would be an additional attribute where the annotating
service can set the canonical form of the term.
To illustarate this, I'll reuse the examples above, with the new proposed
attribute:

"... depending on the <span its-term="yes" its-term-confidence="0.8"
its-term-canonic="symptom">symptoms</span> of the patient..."
"... increases the risk to become <span its-term="yes"
its-term-confidence="0.6" its-term-canonic="diabetes">diabetic</span> in
the future..."

Notice the presence of its-term-canonic attribute, and the canonic form of
the term as value.

Hope this makes sense,
Thank you,
Vasile Topac

Received on Friday, 27 September 2013 13:48:53 UTC