RE: [ACTION-80] consider consolidation of mtDisambiguationData, namedEntity, terminology and textAnalyticsAnnotation

Hi Tadej,

Fine with me. And as a subclass of Terminolgy.conceptMention, named
entities may have also have 'labels' beside the 'type'.
Is that right?

Best, 
Thomas

-----Original Message-----
From: Tadej Stajner [mailto:tadej.stajner@ijs.si] 
Sent: Donnerstag, 10. Mai 2012 15:47
To: Thomas Ruedesheim
Cc: public-multilingualweb-lt@w3.org
Subject: Re: [ACTION-80] consider consolidation of mtDisambiguationData,
namedEntity, terminology and textAnalyticsAnnotation

Hi, Thomas,
It's hard to promise a strict closed set for this use case, since
describing concepts that are mentioned in text is as open domain as it
gets. What we can reasonably require is the following:

- the concept should be dereferencible so that additional information
about the concept available, either via a URI or via an XPath expression
(or via a XPath expression to the URI); Here, we can at least have some
idea of what is well-formed.
- in the case of terms, the users should point to the terminology
lexicon that defines the list of terms; Here, we can actually validate
the values.
- in the case of named entities, there may be only one type;

-- Tadej

On 5/10/2012 3:37 PM, Thomas Ruedesheim wrote:
> Hi Tadej,
>
> I would generally agree to your points. Which range of values would 
> you suggest for the 'concept' property? From the perspective of an MT 
> tool provider, a closed set would be preferred.
>
> Thomas
>
> -----Original Message-----
> From: Tadej Stajner [mailto:tadej.stajner@ijs.si]
> Sent: Donnerstag, 10. Mai 2012 14:07
> To: Thomas Ruedesheim
> Cc: public-multilingualweb-lt@w3.org
> Subject: Re: [ACTION-80] consider consolidation of 
> mtDisambiguationData, namedEntity, terminology and 
> textAnalyticsAnnotation
>
> Hi,
>
> I didn't mention some details about textAnalysisAnnotation that became

> clearer at the last call (the results of which are not reflected yet 
> in the Requirements page): although one could interpret it as a 
> superclass (which I had as well until then), the other part of the 
> interpretation is to express *how* individual annotations were
generated, having:
>
> - tool that was used for annotation (tool name, URI)
> - confidence in the tool output (0.0 - 1.0)
>
> The reason for separating this out is that people might as well 
> manually annotate entities or terms in their content, in which case 
> "textAnalyticsAnnotation" has no sense, since it doesn't involve any 
> text anayltics tools. This makes 'textAnalyiticsAnnotation' ambiguous,

> so I suggest some changes that would avoid using that expression.
>
> Following this logic, we are left with the 'tool' and 'confidence'
> properties. Looking at the requirements, we already have 'author' 
> under the Provenance section and 'mtConfidence' under Translation. 
> Could we expand the scope of author to allow anotating individual 
> fragments and generalize 'mtConfidence' into 'confidence' that would 
> be applicable to any auto annotation?
>
> What I propose is:
>
> - Provenance.author extended to represent automatic annotators, 
> allowed to annotate fragments (if it doesn't already);
> - Translation.mtConfidence generalized to 'confidence' so it can also 
> cover the auto annotation case;
> - Terminology.conceptMention introduced as an abstract class that is 
> the umbrella term (eqivalent what used to be textAnalysisAnnotation, 
> but without the connotation that it was automatically generated);
> - Terminology.mtDisambiguation generalized to 
> Terminology.disambiguation. being a subclass of conceptMention, 
> additionally having a set of 'labels' in alternative languages; It 
> would be used to disambiguating arbitrary fragments of text, like 
> specific phrases, individual words, etc.
> - Terminology.namedEntity becomes a subclass of disambiguation, with 
> the added 'type';
> - Terminology.term becomes a subclass of disambiguation, with the 
> added 'terminology lexicon'
>
> The open thing remaining is how is the 'semantic selector' property 
> different from the 'concept reference'? Does it need to be its own 
> property, or is it fine if we just allow the 'concept' property to 
> accept various formats of selectors, not just URIs?
>
> -- Tadej
>
> On 5/10/2012 1:38 PM, Thomas Ruedesheim wrote:
>> Hi Tadej, hi all,
>>
>> You are apparently right, these data categories are strongly 
>> interrelated. In our opinion, 'textAnalysisAnnotation' is the 
>> umbrella for the remaining categories in the Terminology section. We 
>> would suggest to drop it in favour of the others.
>>
>> I would rename 'mtDisamiguation' as 'disambiguation', because its 
>> usage might not be MT specific. As Pedro already said, this tag may 
>> add some info to the more general 'domain' category without proposing

>> concrete target terms. Its only attribute could be:
>>     'semantic selector': a URI pointing into a common ontology.
>>
>> Both 'namedEntity' and 'terminology' categories seem to be clear (see

>> below).
>>
>> Best,
>> Thomas
>>
>> -----Original Message-----
>> From: Tadej Stajner [mailto:tadej.stajner@ijs.si]
>> Sent: Mittwoch, 9. Mai 2012 19:50
>> To: public-multilingualweb-lt@w3.org
>> Subject: [ACTION-80] consider consolidation of mtDisambiguationData, 
>> namedEntity, terminology and textAnalyticsAnnotation
>>
>> Hi, all,
>>
>> this question is mostly directed to people working in MT with regard 
>> to disambiguation.
>>
>> Since we came to a conclusion that there is strong overlap between 
>> the following data categories, we're consolidating them:
>> mtDisambiguationData
>> namedEntity
>> terminology
>> textAnalyticsAnnotation
>>
>> First of all, there is an obvious common part to the first three.
>> Let's call it the 'concept mention' recipe. It's meant to represent 
>> that some fragment of text is lexicalizing (mentioning) some concept
> with an URI.
>> namedEntity has the following specifics:
>> - type of entity (pointing to an URI, describing that type)
>> - alternative labels (names in different languages)
>>
>> terminology has the following specifics:
>> - terminology lexicon
>> - alternative labels
>>
>> mtDisambiguation also has the concept URI, but additionally define
>> - 'disambiguation data'
>> - 'semantic selector'
>>
>> The open question is: that do these two additional attributes bring 
>> any additional infomation if we already have the fragment 
>> disambiguated with the URI?
>>
>>     If not, is there anything else in mtDisambiguation that could not

>> be covered by the namedEntity and terminology categories?
>>
>> thanks for the input,
>> -- Tadej
>>
>>
>>
>>
>>
>>

Received on Thursday, 10 May 2012 17:40:31 UTC