- From: Dr. David Filip <David.Filip@ul.ie>
- Date: Fri, 11 Jan 2013 11:22:51 +0000
- To: "Lieske, Christian" <christian.lieske@sap.com>
- Cc: "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>
- Message-ID: <CANw5LKmCZSmiq9-9Meh+6r0VZkeUT3c2cqrvdSPjX63ynu_aDg@mail.gmail.com>
Dear Christian, thanks for this insightful comment. I agree that the disambiguation category is one of the most important additions that can expand the usage of the standard and become more useful across technologies and industries. The group had discussed and it is clear that disambiguation and term are somehow related categories. We have however not considered deprecation of the ITS 1.0 term, at least not explicitly. I believe that this is given by the chartered principles of the group [paraphrasing] 1) Do not break 1.0 2) Keep the 1.0 principle of independent categories that can also be independently implemented. I believe that your proposal to fuse term and disambiguation is inline with 2) in the sense of making two seemingly interdependent categories into one fully self contained and independent category, but would violate 1). But even if we did not care for 1), I believe that the relationship between term and disambiguation is a reasonably loose one, i.e. not a hard formal interdependency that would warrant or even mandate normative handling, and thus can and should be handled in non-normative material such as a best practice document, while we are keeping both categories, because they have discernable use cases and still can be implemented independently. A) A user that uses both a terminology management system and a text analytics system for disambiguation can reasonably combine them and their combination can be driven by organization specific process driven considerations. They can for instance harvest spans marked as disambiguation as term candidates for their Terminology database and these can be encoded as terms next time if e.g. a terminologist approves them as terms. B) People using text analytics input only do not need to care about term. C) People using terminology management as the only source do not need to bother with complexities of the disambiguation category. To summarize: While many ITS categories, and prominently term and disambiguation, are informally semantically related, it seems important to keep a reasonable and manageable granularity of the independently implementable categories. I hope this helps to understand the group's motivation for keeping the categories apart. Please let me know Rgds dF Dr. David Filip ======================= LRC | CNGL | LT-Web | CSIS University of Limerick, Ireland telephone: +353-6120-2781 *cellphone: +353-86-0222-158* facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Thu, Jan 10, 2013 at 9:14 AM, Lieske, Christian <christian.lieske@sap.com > wrote: > Hi,**** > > ** ** > > Please find below comments/observations/questions/ideas concerning the ITS > 2.0 working draft dated December 6, 2012 ( > http://www.w3.org/TR/2012/WD-its20-20121206/). Please feel free to > contact me for clarifications if anything is unclear.**** > > ** ** > > The section related to the “disambiguation” data category to me is one of > the most important ones of the draft. ITS 2.0 from my point-of-view moves > ITS 1.0 closer to Natural Language Processing (NLP), and “disambiguation” > to me is related to NLP in various ways. Thus, making “disambiguation” > powerful and easy to use (e.g. via a clear distinction to other data > categories, as well as conceptualizations and wording that are not just > known within linguistics) seems important to me.**** > > **** > > While looking at “disambiguation” from this angle, I started to wonder if > it could benefit from additions/modifications. I apologize in advance if a > reply to this comment may require that discussions which presumably already > took place may have to be summarized.**** > > ** ** > > Here are my observations/questions/ideas:**** > > **** > > **a. **I sense that ITS users will have difficulties to decide when > to use “term” and when to use “disambiguation” (the note in the Working > Draft indicates this). **** > > ** ** > > **b. **Annotation of known terms, generation of so-called “term > candidates”, (named) entity recognition, and other automation can be > subsumed under the heading “(automated) text analysis”.**** > > ** ** > > I am thus wondering if the following would be worth considering:**** > > **** > > **1. **Enhance the current “disambiguation” so that also the > current “term” can be covered**** > > **2. **Deprecate “term”**** > > **3. **Revising some of the terminology used in the spec (e.g. > “disambiguation”, “disambigGranularity”)**** > > **** > > An example use of a revised “disambiguation” (and deprecated “term”) – > partially inspired by ISOCat (see http://www.isocat.org/ ) – is the > following:**** > > ** ** > > Data category name: (automated) text analysis annotation (atan/tan); using > “text analysis annotation” would have the advantage that even manual work > (e.g. “promoting a term candidate to a term”) could be covered**** > > ** ** > > Data category “qualifier” (currently “disambigGranularity”): atan-type or > tan-type**** > > ** ** > > Values for “qualifier”: lexical, term, termCandidate, ontological-class, > ontological-entity; possibly even URIs such as > http://www.isocat.org/datcat/DC-2275 - would allow rather fine-grained > and under certain provisions standard-conformant (ISO 12620; see > http://www.ttt.org/clsframe/datcats.html) annotation**** > > ** ** > > Example:**** > > ** ** > > <span **** > > ** ** > > its-tan-confidence="0.7"**** > > ** ** > > its-tan-class-ref="http://nerd.eurecom.fr/ontology#Place" **** > > ** ** > > its-tan-ident-ref="http://dbpedia.org/resource/Dublin" **** > > ** ** > > its-tan-type=" http://www.isocat.org/datcat/DC-2275">Dublin</span> > **** > > ** ** > > Cheers,**** > > Christian**** > > ** ** > > ** ** >
Received on Friday, 11 January 2013 11:23:58 UTC