Re: ITS 2.0 terminology - canonic term from Felix Sasaki on 2013-10-14 (public-i18n-its-ig@w3.org from October 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Mon, 14 Oct 2013 09:33:53 +0200
To: Vasile Topac <vasile.topac@aut.upt.ro>
CC: Tadej Štajner <tadej.stajner@ijs.si>, public-i18n-its-ig@w3.org
Message-ID: <525B9E61.3030907@w3.org>
Hi Vasile, all,

what you say makes sense and it is both a shortcoming and a strength of 
ITS: the shortcoming is that by accessing the termInfoRef does not 
provide interoperability. The strength is that tools are free to add 
whatever references they want.

I am not sure if references to terminological information is already a 
common use case and whether potential users would agree on what to find 
on the end of the URI. But it would be interesting to hear people's 
thoughts.

Best,

Felix

Am 01.10.13 21:51, schrieb Vasile Topac:
> Hi Tadej and all,
>
> Thank you for your response.
> termInfoRef attribute would be an option for the canonical form; however,
> the problem with this atribute is that it can not present the canonical
> term in a standardized mode,
> so that other ITS consumer tools can easily access it.
>
> For example, current ITS 2.0 specifications mention this example for
> "motherboard" term:
> its:termInfoRef="http://www.directron.com/motherboards1.html"
> So a tool consuming this markup can not access the canonical term from the
> URL nor from going to the destination (being out of ITS standard, and some
> even not accessible programaticaly).
>
> Indeed, sometimes the canonical term may be relative, although in general
> for most languages/cases, it is pretty precise.
> In these cases, providing the canonical form in a ITS standardised mode
> might be of benefit. Especially when the annotating tool uses approximate
> matching to identify terminology.
>
> Hope this makes sense,
> Thank you,
> Vasile
>
>
> On Fri, Sep 27, 2013, Tadej Å tajner <tadej.stajner@ijs.si> said:
>
>> Hi, Vasile, welcome to the mailing list!
>>
>> Wouldn't the termInfoRef attribute work for your case? I'd say that what
>> is 'canonical' can get very relative very quickly, so these things are
>> generally solved by having term bases. Then, you can use them like this:
>>
>> "... depending on the <span its-term="yes" its-term-confidence="0.8"
>> its-term-info-ref="http://myawesometermbase.com/symptom">symptoms</span> of the patient..."
>>
>>
>> best regards,
>> -- Tadej
>>
>> On 26. 09. 2013 12:17, Vasile Topac wrote:
>>> Hi all,
>>>
>>> It's my first email here, so let me introduce my self: I'm Vasile Topac, a
>>> PhD student at Politehnica University of Timisoara, Romania, interested in
>>> increasing accessibility of textual information.
>>> Lately I've been using ITS 2.0 in my research and also developed an
>>> experimental service (available at: http://www.text4all.net/itstagger.html
>>> ) that tags medical terminology (using approximate matching) based on ITS
>>> 2.0, <chapter 8.4 Terminology>.
>>> Based on the experience I had with ITS 2.0 terminology tagging, I want to
>>> propose a new its attribute for adding the canonical form of the annotated
>>> term.
>>> Reason:
>>> In my research I have found that at least 50% of the medical terms
>>> encountered in medical natural language are in derivate form (not in the
>>> canonical one).
>>> Of course, this varies from language to language (aprox 50% in English and
>>> even more for Romanian).
>>> Here are some examples of derivated terms with ITS 2.0 annotation:
>>>
>>> "... depending on the <span its-term="yes"
>>> its-term-confidence="0.8">symptoms</span> of the patient..."
>>> "... increases the risk to become <span its-term="yes"
>>> its-term-confidence="0.6">diabetic</span> in the future..."
>>>
>>> So, given the rate of terms encountered in derivate form, using
>>> approximate terminology matching can improve the anotation a lot (of
>>> course it can let room for errors, but reducing that is another
>>> discussion, not of interest here).
>>> However, when it comes to consumig ITS annotated text, if the translation
>>> tools or NLP tools don't use approximate matching, a lot of annotation may
>>> be useless or confusing.
>>> In order to help such tools identify the exact term, I think it would be
>>> useful if there would be an additional attribute where the annotating
>>> service can set the canonical form of the term.
>>> To illustarate this, I'll reuse the examples above, with the new proposed
>>> attribute:
>>>
>>> "... depending on the <span its-term="yes" its-term-confidence="0.8"
>>> its-term-canonic="symptom">symptoms</span> of the patient..."
>>> "... increases the risk to become <span its-term="yes"
>>> its-term-confidence="0.6" its-term-canonic="diabetes">diabetic</span> in
>>> the future..."
>>>
>>> Notice the presence of its-term-canonic attribute, and the canonic form of
>>> the term as value.
>>>
>>> Hope this makes sense,
>>> Thank you,
>>> Vasile Topac
>>>
>>>
>>>
Received on Monday, 14 October 2013 07:34:27 UTC