- From: Yves Savourel <ysavourel@enlaso.com>
- Date: Fri, 11 Jan 2013 10:36:41 -0700
- To: <public-multilingualweb-lt-comments@w3.org>
Hi all, I would tend to prefer to keep both data categories separated. a) Terminology exists in 1.0 so it's nice to have it in 2.0 as well, even if there are a few extra aspects to it. b) I think the two data categories answer to different use cases, so I don't think it would be good to have a single solution to different problem. Disambiguation is more complex to implement, so we shouldn't put extra burden on implementers who have only a Terminoloy-related use case to support. c) general Descartes principle: break large problem is small parts, it make things easier. So I would recommend to keep the two data categories separated (while I still see Christian's point). Cheers, -yves > +1 > > Hi Christian, David, and all, > > I would have similar arguments for keeping term and disambiguation > separat although they are related. There are several use cases out > there in the wild that need this kind of separation, e.g. terminology > based workflows in a particular supply chain vs. data stream analyses > which prepare the data for further treatment such as a machine > translation application (vocubulary support and training/tuning life > cycles). > > One other topic is the discussion of the ISOCat elements which to some > extend would force applications to adopt an NLP standard that might > not be appropriate for a given application scenario, e.g. those that > do not use NLP technologies at all. Therefore, I would also recommend > that we do not talk about bringing ITS closer to NLP because ITS > should remain open and deployable for different language processing > strategies. > > Nevertheless, thanks a lot for raising these concerns. > > All the best -- Jörg > > On Jan 11, 2013, at 12:22 (CET), Dr. David Filip wrote: >> Dear Christian, thanks for this insightful comment. >> I agree that the disambiguation category is one of the most important >> additions that can expand the usage of the standard and become more >> useful across technologies and industries. >> >> The group had discussed and it is clear that disambiguation and term >> are somehow related categories. We have however not considered >> deprecation of the ITS 1.0 term, at least not explicitly. >> >> I believe that this is given by the chartered principles of the group >> [paraphrasing] >> 1) Do not break 1.0 >> 2) Keep the 1.0 principle of independent categories that can also be >> independently implemented. >> >> I believe that your proposal to fuse term and disambiguation is >> inline with 2) in the sense of making two seemingly interdependent >> categories into one fully self contained and independent category, >> but would violate 1). >> >> But even if we did not care for 1), I believe that the relationship >> between term and disambiguation is a reasonably loose one, i.e. not a >> hard formal interdependency that would warrant or even mandate >> normative handling, and thus can and should be handled in >> non-normative material such as a best practice document, while we are >> keeping both categories, because they have discernable use cases and >> still can be implemented independently. >> >> A) >> A user that uses both a terminology management system and a text >> analytics system for disambiguation can reasonably combine them and >> their combination can be driven by organization specific process >> driven considerations. They can for instance harvest spans marked as >> disambiguation as term candidates for their Terminology database and >> these can be encoded as terms next time if e.g. a terminologist >> approves them as terms. >> >> B) >> People using text analytics input only do not need to care about term. >> >> C) >> People using terminology management as the only source do not need to >> bother with complexities of the disambiguation category. >> >> To summarize: >> While many ITS categories, and prominently term and disambiguation, >> are informally semantically related, it seems important to keep a >> reasonable and manageable granularity of the independently >> implementable categories. >> >> I hope this helps to understand the group's motivation for keeping >> the categories apart. >> Please let me know >> Rgds >> dF >> >> Dr. David Filip >> ======================= >> LRC | CNGL | LT-Web | CSIS >> University of Limerick, Ireland >> telephone: +353-6120-2781 >> *cellphone: +353-86-0222-158* >> facsimile: +353-6120-2734 >> mailto: david.filip@ul.ie <mailto:david.filip@ul.ie> >> >> >> On Thu, Jan 10, 2013 at 9:14 AM, Lieske, Christian >> <christian.lieske@sap.com <mailto:christian.lieske@sap.com>> wrote: >> >> Hi,____ >> >> __ __ >> >> Please find below comments/observations/questions/ideas concerning >> the ITS 2.0 working draft dated December 6, 2012 >> (http://www.w3.org/TR/2012/WD-its20-20121206/). Please feel free to >> contact me for clarifications if anything is unclear.____ >> >> __ __ >> >> The section related to the “disambiguation” data category to me is >> one of the most important ones of the draft. ITS 2.0 from my >> point-of-view moves ITS 1.0 closer to Natural Language Processing >> (NLP), and “disambiguation” to me is related to NLP in various ways. >> Thus, making “disambiguation” powerful and easy to use (e.g. via a >> clear distinction to other data categories, as well as >> conceptualizations and wording that are not just known within >> linguistics) seems important to me.____ >> >> ____ >> >> While looking at “disambiguation” from this angle, I started to >> wonder if it could benefit from additions/modifications. I apologize >> in advance if a reply to this comment may require that discussions >> which presumably already took place may have to be >> summarized.____ >> >> __ __ >> >> Here are my observations/questions/ideas:____ >> >> ____ >> >> __a.__I sense that ITS users will have difficulties to decide when >> to use “term” and when to use “disambiguation” (the note in the >> Working Draft indicates this). ____ >> >> __ __ >> >> __b.__Annotation of known terms, generation of so-called “term >> candidates”, (named) entity recognition, and other automation can be >> subsumed under the heading “(automated) text analysis”.____ >> >> __ __ >> >> I am thus wondering if the following would be worth >> considering:____ >> >> ____ >> >> __1.__Enhance the current “disambiguation” so that also the current >> “term” can be covered____ >> >> __2.__Deprecate “term”____ >> >> __3.__Revising some of the terminology used in the spec (e.g. >> “disambiguation”, “disambigGranularity”)____ >> >> ____ >> >> An example use of a revised “disambiguation” (and deprecated “term”) >> – partially inspired by ISOCat (see http://www.isocat.org/ ) – is >> the following:____ >> >> __ __ >> >> Data category name: (automated) text analysis annotation (atan/tan); >> using “text analysis annotation” would have the advantage that even >> manual work (e.g. “promoting a term candidate to a term”) could be >> covered____ >> >> __ __ >> >> Data category “qualifier” (currently “disambigGranularity”): >> atan-type or tan-type____ >> >> __ __ >> >> Values for “qualifier”: lexical, term, termCandidate, >> ontological-class, ontological-entity; possibly even URIs such as >> http://www.isocat.org/datcat/DC-2275 - would allow rather >> fine-grained and under certain provisions standard-conformant (ISO >> 12620; see http://www.ttt.org/clsframe/datcats.html) >> annotation____ >> >> __ __ >> >> Example:____ >> >> __ __ >> >> <span ____ >> >> __ __ >> >> its-tan-confidence="0.7"____ >> >> __ __ >> >> its-tan-class-ref="http://nerd.eurecom.fr/ontology#Place" >> ____ >> >> __ __ >> >> its-tan-ident-ref="http://dbpedia.org/resource/Dublin" ____ >> >> __ __ >> >> its-tan-type=" >> http://www.isocat.org/datcat/DC-2275">Dublin</span> ____ >> >> __ __ >> >> Cheers,____ >> >> Christian____ >> >
Received on Friday, 11 January 2013 17:37:21 UTC