- From: David Lewis <dave.lewis@cs.tcd.ie>
- Date: Sat, 28 Apr 2012 01:04:31 +0100
- To: Tadej Stajner <tadej.stajner@ijs.si>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
- Message-ID: <4F9B340F.1070700@cs.tcd.ie>
Hi Tadej, guys, I've moved this branch of the thread under a subject line for a new action to consider this consolidation (which I gave to you Tadej). Could I ask the MT guys, Declan, Pedro, Daniel, to give some insight into what is needed for mtDisambiguation. What form of disambiguation information would this point to? Would this have different (lexical) properties from a regular term base, e.g. contextual conditions? I guess this would be different based on whether this is RBMT or SMT right? Similarly, is the namedEntityRecognition any different to the general sense of terminology? Tadej, I'd be cautious of considering textAnalyticsAnnotation as the superclass here, since certainly it will still often be the case that the annotation results from human source text review/QA, rather than necessarily from an automated, NLP component doing text analytics. The common characteristic would seem to be the need to associate a term or phrase with some external information for use in later processing, as is broadly supported by the current terminology data category. The definition in http://www.w3.org/TR/2007/REC-its-20070403/#terminology is actually fairly loose in this regard - it doesn't specify that the data category be used for terminology management specifically, despite what the name would indicate. Could the same approach of simply associating with external information be taken regardless of whether this be a link to a term base, including term translations and definitions, link to a conceptual node in a semantic web ontology or lexical store or some special MT disambiguations store. Or does the differing nature of these external resources require a hint in the data category name that it should be accessed in different ways. Perhaps you guys could list out some of the use cases in a bit more detail, it might become clearer what the commonalities really are, and then to make a judgement on whether they can really be consolidated, or whether they represent very separate use cases that should be kept separate (which is fine, we should not consolidate for the sake of it, as forcing unlinked functionality behaviour together in a data category could harm its uptake by implementers). Also, consider if the need for additional attributes requires a separation. For instance, terminology associations arising from NLP may benefit from a confidence score as we discussed previously. But perhaps that only needs an additional optional attribute to accompany the terminology attribtue? Note, if we are interested in recording properties of the process, e.g. which text analytics engine or terminology expert was involved in entering the attribute into a document, this may be better captured using the provenance data category. Sorry for the long post, but please try and advance the discussion before the call next friday. cheers, Dave On 26/04/2012 14:29, Tadej Stajner wrote: > On 4/26/2012 2:23 PM, David Lewis wrote: >> Dear all, >> I have four further suggestions for consolidating requirements that >> I'd like to discuss briefly on the call with the relevant people: >> >> Pedro, Dabiel, Declan, Tadej: I think there may be opportunity to >> consolidate mtDisambiguationData, namedEntity, terminology and >> textAnalyticsAnnotation. For instance is MT disambiguation really >> terminology support for MT? >> > > Yes, they all have a lot in common. The way I see it, textAnalytics > annotation is the common superclass of the other three, > mtDisambiguation seems to focus on difficult content, namedEntity on > named entities and term on terms. They all allow referring to an > ontology URI behind the fragment they are annotating - this property > is equivalent across all three, but there are specifics in each category. > > - what would qualify as difficult content under mtDisambiguation? > - is there anything MT-specific in mtDisambiguation? Or can we call it > simply "disambiguation"? > - mtDisambiguation-domainSelector is very similar in functionality to > term-terminologyResource, could we consolidate those? > - namedEntity-type can be seen as a special case of a > mtDisambiguation-semanticSelector; > > My recommendation would be to gather some common properties and pull > them in the textAnalyticsAnnotation superclass. > >> comments weclome, >> Dave >> >> >
Received on Saturday, 28 April 2012 00:04:58 UTC