- From: Tadej Stajner <tadej.stajner@ijs.si>
- Date: Thu, 10 May 2012 14:06:59 +0200
- To: Thomas Ruedesheim <thomas.ruedesheim@lucysoftware.com>
- CC: public-multilingualweb-lt@w3.org
Hi, I didn't mention some details about textAnalysisAnnotation that became clearer at the last call (the results of which are not reflected yet in the Requirements page): although one could interpret it as a superclass (which I had as well until then), the other part of the interpretation is to express *how* individual annotations were generated, having: - tool that was used for annotation (tool name, URI) - confidence in the tool output (0.0 - 1.0) The reason for separating this out is that people might as well manually annotate entities or terms in their content, in which case "textAnalyticsAnnotation" has no sense, since it doesn't involve any text anayltics tools. This makes 'textAnalyiticsAnnotation' ambiguous, so I suggest some changes that would avoid using that expression. Following this logic, we are left with the 'tool' and 'confidence' properties. Looking at the requirements, we already have 'author' under the Provenance section and 'mtConfidence' under Translation. Could we expand the scope of author to allow anotating individual fragments and generalize 'mtConfidence' into 'confidence' that would be applicable to any auto annotation? What I propose is: - Provenance.author extended to represent automatic annotators, allowed to annotate fragments (if it doesn't already); - Translation.mtConfidence generalized to 'confidence' so it can also cover the auto annotation case; - Terminology.conceptMention introduced as an abstract class that is the umbrella term (eqivalent what used to be textAnalysisAnnotation, but without the connotation that it was automatically generated); - Terminology.mtDisambiguation generalized to Terminology.disambiguation. being a subclass of conceptMention, additionally having a set of 'labels' in alternative languages; It would be used to disambiguating arbitrary fragments of text, like specific phrases, individual words, etc. - Terminology.namedEntity becomes a subclass of disambiguation, with the added 'type'; - Terminology.term becomes a subclass of disambiguation, with the added 'terminology lexicon' The open thing remaining is how is the 'semantic selector' property different from the 'concept reference'? Does it need to be its own property, or is it fine if we just allow the 'concept' property to accept various formats of selectors, not just URIs? -- Tadej On 5/10/2012 1:38 PM, Thomas Ruedesheim wrote: > > Hi Tadej, hi all, > > You are apparently right, these data categories are strongly > interrelated. In our opinion, 'textAnalysisAnnotation' is the umbrella > for the remaining categories in the Terminology section. We would > suggest to drop it in favour of the others. > > I would rename 'mtDisamiguation' as 'disambiguation', because its usage > might not be MT specific. As Pedro already said, this tag may add some > info to the more general 'domain' category without proposing concrete > target terms. Its only attribute could be: > 'semantic selector': a URI pointing into a common ontology. > > Both 'namedEntity' and 'terminology' categories seem to be clear (see > below). > > Best, > Thomas > > -----Original Message----- > From: Tadej Stajner [mailto:tadej.stajner@ijs.si] > Sent: Mittwoch, 9. Mai 2012 19:50 > To: public-multilingualweb-lt@w3.org > Subject: [ACTION-80] consider consolidation of mtDisambiguationData, > namedEntity, terminology and textAnalyticsAnnotation > > Hi, all, > > this question is mostly directed to people working in MT with regard to > disambiguation. > > Since we came to a conclusion that there is strong overlap between the > following data categories, we're consolidating them: > mtDisambiguationData > namedEntity > terminology > textAnalyticsAnnotation > > First of all, there is an obvious common part to the first three. Let's > call it the 'concept mention' recipe. It's meant to represent that some > fragment of text is lexicalizing (mentioning) some concept with an URI. > > namedEntity has the following specifics: > - type of entity (pointing to an URI, describing that type) > - alternative labels (names in different languages) > > terminology has the following specifics: > - terminology lexicon > - alternative labels > > mtDisambiguation also has the concept URI, but additionally define > - 'disambiguation data' > - 'semantic selector' > > The open question is: that do these two additional attributes bring any > additional infomation if we already have the fragment disambiguated with > the URI? > > If not, is there anything else in mtDisambiguation that could not be > covered by the namedEntity and terminology categories? > > thanks for the input, > -- Tadej > > > > > >
Received on Thursday, 10 May 2012 12:07:57 UTC