Re: Comment on ITS 2.0 WD-its20-20121206 - Disambiguation (and term) from Dr. David Filip on 2013-01-11 (public-multilingualweb-lt-comments@w3.org from January 2013)

From: Dr. David Filip <David.Filip@ul.ie>
Date: Fri, 11 Jan 2013 11:22:51 +0000
To: "Lieske, Christian" <christian.lieske@sap.com>
Cc: "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>
Message-ID: <CANw5LKmCZSmiq9-9Meh+6r0VZkeUT3c2cqrvdSPjX63ynu_aDg@mail.gmail.com>
Dear Christian, thanks for this insightful comment.
I agree that the disambiguation category is one of the most important
additions that can expand the usage of the standard and become more useful
across technologies and industries.

The group had discussed and it is clear that disambiguation and term are
somehow related categories. We have however not considered deprecation of
the ITS 1.0 term, at least not explicitly.

I believe that this is given by the chartered principles of the group
[paraphrasing]
1) Do not break 1.0
2) Keep the 1.0 principle of independent categories that can also be
independently implemented.

I believe that your proposal to fuse term and disambiguation is inline with
2) in the sense of making two seemingly interdependent categories into one
fully self contained and independent category, but would violate 1).

But even if we did not care for 1), I believe that the relationship between
term and disambiguation is a reasonably loose one, i.e. not a hard formal
interdependency that would warrant or even mandate normative handling, and
thus can and should be handled in non-normative material such as a best
practice document, while we are keeping both categories, because they have
discernable use cases and still can be implemented independently.

A)
A user that uses both a terminology management system and a text analytics
system for disambiguation can reasonably combine them and their combination
can be driven by organization specific process driven considerations. They
can for instance harvest spans marked as disambiguation as term candidates
for their Terminology database and these can be encoded as terms next time
if e.g. a  terminologist approves them as terms.

B)
People using text analytics input only do not need to care about term.

C)
People using terminology management as the only source do not need to
bother with complexities of the disambiguation category.

To summarize:
While many ITS categories, and prominently term and disambiguation, are
informally semantically related, it seems important to keep a reasonable
and manageable granularity of the independently implementable categories.

I hope this helps to understand the group's motivation for keeping the
categories apart.
Please let me know
Rgds
dF

Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
*cellphone: +353-86-0222-158*
facsimile: +353-6120-2734
mailto: david.filip@ul.ie


On Thu, Jan 10, 2013 at 9:14 AM, Lieske, Christian <christian.lieske@sap.com
> wrote:

> Hi,****
>
> ** **
>
> Please find below comments/observations/questions/ideas concerning the ITS
> 2.0 working draft dated December 6, 2012 (
> http://www.w3.org/TR/2012/WD-its20-20121206/).  Please feel free to
> contact me for clarifications if anything is unclear.****
>
> ** **
>
> The section related to the “disambiguation” data category to me is one of
> the most important ones of the draft. ITS 2.0 from my point-of-view moves
> ITS 1.0 closer to Natural Language Processing (NLP), and “disambiguation”
> to me is related to NLP in various ways. Thus, making “disambiguation”
> powerful and easy to use (e.g. via a clear distinction to other data
> categories, as well as conceptualizations and wording that are not just
> known within linguistics) seems important to me.****
>
>  ****
>
> While looking at “disambiguation” from this angle, I started to wonder if
> it could benefit from additions/modifications. I apologize in advance if a
> reply to this comment may require that discussions which presumably already
> took place may have to be summarized.****
>
> ** **
>
> Here are my observations/questions/ideas:****
>
>  ****
>
> **a.       **I sense that ITS users will have difficulties to decide when
> to use “term” and when to use “disambiguation” (the note in the Working
> Draft indicates this). ****
>
> ** **
>
> **b.      **Annotation of known terms, generation of so-called “term
> candidates”, (named) entity recognition, and other automation can be
> subsumed under the heading “(automated) text analysis”.****
>
> ** **
>
> I am thus wondering if the following would be worth considering:****
>
>  ****
>
> **1.       **Enhance the current “disambiguation” so that also the
> current “term” can be covered****
>
> **2.       **Deprecate “term”****
>
> **3.       **Revising some of the terminology used in the spec (e.g.
> “disambiguation”, “disambigGranularity”)****
>
>  ****
>
> An example use of a revised “disambiguation” (and deprecated “term”) –
> partially inspired by ISOCat (see http://www.isocat.org/ ) – is the
> following:****
>
> ** **
>
> Data category name: (automated) text analysis annotation (atan/tan); using
> “text analysis annotation” would have the advantage that even manual work
> (e.g. “promoting a term candidate to a term”) could be covered****
>
> ** **
>
> Data category “qualifier” (currently “disambigGranularity”): atan-type or
> tan-type****
>
> ** **
>
> Values for “qualifier”: lexical, term, termCandidate, ontological-class,
> ontological-entity; possibly even URIs such as
> http://www.isocat.org/datcat/DC-2275 - would allow rather fine-grained
> and under certain provisions standard-conformant (ISO 12620; see
> http://www.ttt.org/clsframe/datcats.html) annotation****
>
> ** **
>
> Example:****
>
> ** **
>
>        <span ****
>
> ** **
>
>           its-tan-confidence="0.7"****
>
> ** **
>
>           its-tan-class-ref="http://nerd.eurecom.fr/ontology#Place"  ****
>
> ** **
>
>           its-tan-ident-ref="http://dbpedia.org/resource/Dublin" ****
>
> ** **
>
>           its-tan-type=" http://www.isocat.org/datcat/DC-2275">Dublin</span>
> ****
>
> ** **
>
> Cheers,****
>
> Christian****
>
> ** **
>
> ** **
>
Received on Friday, 11 January 2013 11:23:58 UTC