- From: Thomas Ruedesheim <thomas.ruedesheim@lucysoftware.com>
- Date: Thu, 10 May 2012 16:21:35 +0200
- To: "Tadej Stajner" <tadej.stajner@ijs.si>
- Cc: <public-multilingualweb-lt@w3.org>
Hi Tadej, Fine with me. And as a subclass of Terminolgy.conceptMention, named entities may have also have 'labels' beside the 'type'. Is that right? Best, Thomas -----Original Message----- From: Tadej Stajner [mailto:tadej.stajner@ijs.si] Sent: Donnerstag, 10. Mai 2012 15:47 To: Thomas Ruedesheim Cc: public-multilingualweb-lt@w3.org Subject: Re: [ACTION-80] consider consolidation of mtDisambiguationData, namedEntity, terminology and textAnalyticsAnnotation Hi, Thomas, It's hard to promise a strict closed set for this use case, since describing concepts that are mentioned in text is as open domain as it gets. What we can reasonably require is the following: - the concept should be dereferencible so that additional information about the concept available, either via a URI or via an XPath expression (or via a XPath expression to the URI); Here, we can at least have some idea of what is well-formed. - in the case of terms, the users should point to the terminology lexicon that defines the list of terms; Here, we can actually validate the values. - in the case of named entities, there may be only one type; -- Tadej On 5/10/2012 3:37 PM, Thomas Ruedesheim wrote: > Hi Tadej, > > I would generally agree to your points. Which range of values would > you suggest for the 'concept' property? From the perspective of an MT > tool provider, a closed set would be preferred. > > Thomas > > -----Original Message----- > From: Tadej Stajner [mailto:tadej.stajner@ijs.si] > Sent: Donnerstag, 10. Mai 2012 14:07 > To: Thomas Ruedesheim > Cc: public-multilingualweb-lt@w3.org > Subject: Re: [ACTION-80] consider consolidation of > mtDisambiguationData, namedEntity, terminology and > textAnalyticsAnnotation > > Hi, > > I didn't mention some details about textAnalysisAnnotation that became > clearer at the last call (the results of which are not reflected yet > in the Requirements page): although one could interpret it as a > superclass (which I had as well until then), the other part of the > interpretation is to express *how* individual annotations were generated, having: > > - tool that was used for annotation (tool name, URI) > - confidence in the tool output (0.0 - 1.0) > > The reason for separating this out is that people might as well > manually annotate entities or terms in their content, in which case > "textAnalyticsAnnotation" has no sense, since it doesn't involve any > text anayltics tools. This makes 'textAnalyiticsAnnotation' ambiguous, > so I suggest some changes that would avoid using that expression. > > Following this logic, we are left with the 'tool' and 'confidence' > properties. Looking at the requirements, we already have 'author' > under the Provenance section and 'mtConfidence' under Translation. > Could we expand the scope of author to allow anotating individual > fragments and generalize 'mtConfidence' into 'confidence' that would > be applicable to any auto annotation? > > What I propose is: > > - Provenance.author extended to represent automatic annotators, > allowed to annotate fragments (if it doesn't already); > - Translation.mtConfidence generalized to 'confidence' so it can also > cover the auto annotation case; > - Terminology.conceptMention introduced as an abstract class that is > the umbrella term (eqivalent what used to be textAnalysisAnnotation, > but without the connotation that it was automatically generated); > - Terminology.mtDisambiguation generalized to > Terminology.disambiguation. being a subclass of conceptMention, > additionally having a set of 'labels' in alternative languages; It > would be used to disambiguating arbitrary fragments of text, like > specific phrases, individual words, etc. > - Terminology.namedEntity becomes a subclass of disambiguation, with > the added 'type'; > - Terminology.term becomes a subclass of disambiguation, with the > added 'terminology lexicon' > > The open thing remaining is how is the 'semantic selector' property > different from the 'concept reference'? Does it need to be its own > property, or is it fine if we just allow the 'concept' property to > accept various formats of selectors, not just URIs? > > -- Tadej > > On 5/10/2012 1:38 PM, Thomas Ruedesheim wrote: >> Hi Tadej, hi all, >> >> You are apparently right, these data categories are strongly >> interrelated. In our opinion, 'textAnalysisAnnotation' is the >> umbrella for the remaining categories in the Terminology section. We >> would suggest to drop it in favour of the others. >> >> I would rename 'mtDisamiguation' as 'disambiguation', because its >> usage might not be MT specific. As Pedro already said, this tag may >> add some info to the more general 'domain' category without proposing >> concrete target terms. Its only attribute could be: >> 'semantic selector': a URI pointing into a common ontology. >> >> Both 'namedEntity' and 'terminology' categories seem to be clear (see >> below). >> >> Best, >> Thomas >> >> -----Original Message----- >> From: Tadej Stajner [mailto:tadej.stajner@ijs.si] >> Sent: Mittwoch, 9. Mai 2012 19:50 >> To: public-multilingualweb-lt@w3.org >> Subject: [ACTION-80] consider consolidation of mtDisambiguationData, >> namedEntity, terminology and textAnalyticsAnnotation >> >> Hi, all, >> >> this question is mostly directed to people working in MT with regard >> to disambiguation. >> >> Since we came to a conclusion that there is strong overlap between >> the following data categories, we're consolidating them: >> mtDisambiguationData >> namedEntity >> terminology >> textAnalyticsAnnotation >> >> First of all, there is an obvious common part to the first three. >> Let's call it the 'concept mention' recipe. It's meant to represent >> that some fragment of text is lexicalizing (mentioning) some concept > with an URI. >> namedEntity has the following specifics: >> - type of entity (pointing to an URI, describing that type) >> - alternative labels (names in different languages) >> >> terminology has the following specifics: >> - terminology lexicon >> - alternative labels >> >> mtDisambiguation also has the concept URI, but additionally define >> - 'disambiguation data' >> - 'semantic selector' >> >> The open question is: that do these two additional attributes bring >> any additional infomation if we already have the fragment >> disambiguated with the URI? >> >> If not, is there anything else in mtDisambiguation that could not >> be covered by the namedEntity and terminology categories? >> >> thanks for the input, >> -- Tadej >> >> >> >> >> >>
Received on Thursday, 10 May 2012 17:40:31 UTC