- From: Vasile Topac <vasile.topac@aut.upt.ro>
- Date: Thu, 26 Sep 2013 10:17:04 -0000
- To: <public-i18n-its-ig@w3.org>
Hi all, It's my first email here, so let me introduce my self: I'm Vasile Topac, a PhD student at Politehnica University of Timisoara, Romania, interested in increasing accessibility of textual information. Lately I've been using ITS 2.0 in my research and also developed an experimental service (available at: http://www.text4all.net/itstagger.html ) that tags medical terminology (using approximate matching) based on ITS 2.0, <chapter 8.4 Terminology>. Based on the experience I had with ITS 2.0 terminology tagging, I want to propose a new its attribute for adding the canonical form of the annotated term. Reason: In my research I have found that at least 50% of the medical terms encountered in medical natural language are in derivate form (not in the canonical one). Of course, this varies from language to language (aprox 50% in English and even more for Romanian). Here are some examples of derivated terms with ITS 2.0 annotation: "... depending on the <span its-term="yes" its-term-confidence="0.8">symptoms</span> of the patient..." "... increases the risk to become <span its-term="yes" its-term-confidence="0.6">diabetic</span> in the future..." So, given the rate of terms encountered in derivate form, using approximate terminology matching can improve the anotation a lot (of course it can let room for errors, but reducing that is another discussion, not of interest here). However, when it comes to consumig ITS annotated text, if the translation tools or NLP tools don't use approximate matching, a lot of annotation may be useless or confusing. In order to help such tools identify the exact term, I think it would be useful if there would be an additional attribute where the annotating service can set the canonical form of the term. To illustarate this, I'll reuse the examples above, with the new proposed attribute: "... depending on the <span its-term="yes" its-term-confidence="0.8" its-term-canonic="symptom">symptoms</span> of the patient..." "... increases the risk to become <span its-term="yes" its-term-confidence="0.6" its-term-canonic="diabetes">diabetic</span> in the future..." Notice the presence of its-term-canonic attribute, and the canonic form of the term as value. Hope this makes sense, Thank you, Vasile Topac
Received on Friday, 27 September 2013 13:48:53 UTC