RE: issue-68 (Re: Comment on ITS 2.0 WD-its20-20121206 - Disambiguation (and term)) from Yves Savourel on 2013-01-11 (public-multilingualweb-lt-comments@w3.org from January 2013)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Fri, 11 Jan 2013 10:36:41 -0700
To: <public-multilingualweb-lt-comments@w3.org>
Message-ID: <assp.0723fcae8c.assp.07233b1a7c.00b501cdf022$37b2ee40$a718cac0$@com>
Hi all,

I would tend to prefer to keep both data categories separated.

a) Terminology exists in 1.0 so it's nice to have it in 2.0 as well, even if there are a few extra aspects to it.

b) I think the two data categories answer to different use cases, so I don't think it would be good to have a single solution to different problem. Disambiguation is more complex to implement, so we shouldn't put extra burden on implementers who have only a Terminoloy-related use case to support.

c) general Descartes principle: break large problem is small parts, it make things easier.

So I would recommend to keep the two data categories separated (while I still see Christian's point).

Cheers,
-yves


> +1
>
> Hi Christian, David, and all,
>
> I would have similar arguments for keeping term and disambiguation 
> separat although they are related. There are several use cases out 
> there in the wild that need this kind of separation, e.g. terminology 
> based workflows in a particular supply chain vs. data stream analyses 
> which prepare the data for further treatment such as a machine 
> translation application (vocubulary support and training/tuning life 
> cycles).
>
> One other topic is the discussion of the ISOCat elements which to some 
> extend would force applications to adopt an NLP standard that might 
> not be appropriate for a given application scenario, e.g. those that 
> do not use NLP technologies at all. Therefore, I would also recommend 
> that we do not talk about bringing ITS closer to NLP because ITS 
> should remain open and deployable for different language processing 
> strategies.
>
> Nevertheless, thanks a lot for raising these concerns.
>
> All the best -- Jörg
>
> On Jan 11, 2013, at 12:22 (CET), Dr. David Filip wrote:
>> Dear Christian, thanks for this insightful comment.
>> I agree that the disambiguation category is one of the most important 
>> additions that can expand the usage of the standard and become more 
>> useful across technologies and industries.
>>
>> The group had discussed and it is clear that disambiguation and term 
>> are somehow related categories. We have however not considered 
>> deprecation of the ITS 1.0 term, at least not explicitly.
>>
>> I believe that this is given by the chartered principles of the group 
>> [paraphrasing]
>> 1) Do not break 1.0
>> 2) Keep the 1.0 principle of independent categories that can also be 
>> independently implemented.
>>
>> I believe that your proposal to fuse term and disambiguation is 
>> inline with 2) in the sense of making two seemingly interdependent 
>> categories into one fully self contained and independent category, 
>> but would violate 1).
>>
>> But even if we did not care for 1), I believe that the relationship 
>> between term and disambiguation is a reasonably loose one, i.e. not a 
>> hard formal interdependency that would warrant or even mandate 
>> normative handling, and thus can and should be handled in 
>> non-normative material such as a best practice document, while we are 
>> keeping both categories, because they have discernable use cases and 
>> still can be implemented independently.
>>
>> A)
>> A user that uses both a terminology management system and a text 
>> analytics system for disambiguation can reasonably combine them and 
>> their combination can be driven by organization specific process 
>> driven considerations. They can for instance harvest spans marked as 
>> disambiguation as term candidates for their Terminology database and 
>> these can be encoded as terms next time if e.g. a  terminologist 
>> approves them as terms.
>>
>> B)
>> People using text analytics input only do not need to care about term.
>>
>> C)
>> People using terminology management as the only source do not need to 
>> bother with complexities of the disambiguation category.
>>
>> To summarize:
>> While many ITS categories, and prominently term and disambiguation, 
>> are informally semantically related, it seems important to keep a 
>> reasonable and manageable granularity of the independently 
>> implementable categories.
>>
>> I hope this helps to understand the group's motivation for keeping 
>> the categories apart.
>> Please let me know
>> Rgds
>> dF
>>
>> Dr. David Filip
>> =======================
>> LRC | CNGL | LT-Web | CSIS
>> University of Limerick, Ireland
>> telephone: +353-6120-2781
>> *cellphone: +353-86-0222-158*
>> facsimile: +353-6120-2734
>> mailto: david.filip@ul.ie <mailto:david.filip@ul.ie>
>>
>>
>> On Thu, Jan 10, 2013 at 9:14 AM, Lieske, Christian 
>> <christian.lieske@sap.com <mailto:christian.lieske@sap.com>> wrote:
>>
>>     Hi,____
>>
>>     __ __
>>
>>     Please find below comments/observations/questions/ideas concerning
>>     the ITS 2.0 working draft dated December 6, 2012
>>     (http://www.w3.org/TR/2012/WD-its20-20121206/).  Please feel free to
>>     contact me for clarifications if anything is unclear.____
>>
>>     __ __
>>
>>     The section related to the “disambiguation” data category to me is
>>     one of the most important ones of the draft. ITS 2.0 from my
>>     point-of-view moves ITS 1.0 closer to Natural Language Processing
>>     (NLP), and “disambiguation” to me is related to NLP in various ways.
>>     Thus, making “disambiguation” powerful and easy to use (e.g. via a
>>     clear distinction to other data categories, as well as
>>     conceptualizations and wording that are not just known within
>>     linguistics) seems important to me.____
>>
>>     ____
>>
>>     While looking at “disambiguation” from this angle, I started to
>>     wonder if it could benefit from additions/modifications. I apologize
>>     in advance if a reply to this comment may require that discussions
>>     which presumably already took place may have to be 
>> summarized.____
>>
>>     __ __
>>
>>     Here are my observations/questions/ideas:____
>>
>>     ____
>>
>>     __a.__I sense that ITS users will have difficulties to decide when
>>     to use “term” and when to use “disambiguation” (the note in the
>>     Working Draft indicates this). ____
>>
>>     __ __
>>
>>     __b.__Annotation of known terms, generation of so-called “term
>>     candidates”, (named) entity recognition, and other automation can be
>>     subsumed under the heading “(automated) text analysis”.____
>>
>>     __ __
>>
>>     I am thus wondering if the following would be worth 
>> considering:____
>>
>>     ____
>>
>>     __1.__Enhance the current “disambiguation” so that also the current
>>     “term” can be covered____
>>
>>     __2.__Deprecate “term”____
>>
>>     __3.__Revising some of the terminology used in the spec (e.g.
>>     “disambiguation”, “disambigGranularity”)____
>>
>>     ____
>>
>>     An example use of a revised “disambiguation” (and deprecated “term”)
>>     – partially inspired by ISOCat (see http://www.isocat.org/ ) – is
>>     the following:____
>>
>>     __ __
>>
>>     Data category name: (automated) text analysis annotation (atan/tan);
>>     using “text analysis annotation” would have the advantage that even
>>     manual work (e.g. “promoting a term candidate to a term”) could be
>>     covered____
>>
>>     __ __
>>
>>     Data category “qualifier” (currently “disambigGranularity”):
>>     atan-type or tan-type____
>>
>>     __ __
>>
>>     Values for “qualifier”: lexical, term, termCandidate,
>>     ontological-class, ontological-entity; possibly even URIs such as
>>     http://www.isocat.org/datcat/DC-2275 - would allow rather
>>     fine-grained and under certain provisions standard-conformant (ISO
>>     12620; see http://www.ttt.org/clsframe/datcats.html) 
>> annotation____
>>
>>     __ __
>>
>>     Example:____
>>
>>     __ __
>>
>>             <span ____
>>
>>     __ __
>>
>>                its-tan-confidence="0.7"____
>>
>>     __ __
>>
>> its-tan-class-ref="http://nerd.eurecom.fr/ontology#Place"
>>     ____
>>
>>     __ __
>>
>> its-tan-ident-ref="http://dbpedia.org/resource/Dublin" ____
>>
>>     __ __
>>
>>                its-tan-type="
>>     http://www.isocat.org/datcat/DC-2275">Dublin</span> ____
>>
>>     __ __
>>
>>     Cheers,____
>>
>>     Christian____
>>
>
Received on Friday, 11 January 2013 17:37:21 UTC