W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > January 2013

RE: Issue-67 - Term + Disambiguation

From: Yves Savourel <ysavourel@enlaso.com>
Date: Wed, 30 Jan 2013 13:12:35 -0700
To: <public-multilingualweb-lt@w3.org>
Message-ID: <assp.07427eac7c.assp.0742103643.002601cdff26$24a537a0$6defa6e0$@com>
Hi Tadej, Felix, all,

 

Oh, I see. Thanks for clearing that up for me: I didn’t understood that.

Then, yes, I suppose it is a “less major” change.

 

The implementation modifications should be relatively easy too.

 

I think things are clear (and ok) for me now, as far as changing disambiguation.

 

Thanks,

-yves

 

 

From: Felix Sasaki [mailto:fsasaki@w3.org] 
Sent: Wednesday, January 30, 2013 12:31 PM
To: public-multilingualweb-lt@w3.org
Subject: Re: Issue-67 - Term + Disambiguation

 

Hi Yves, Tadej, all,


Am 30.01.13 19:56, schrieb Tadej Stajner:

Hi, Yves,
no, this doesn't mean that we're only supporting named entity from now on. We still allow people to annotate with whatever type of text analysis they choose to do (lexical, ontology, etc. ), but we don't care about the level of the analysis. They're now all various values of its:tanIdent*.  in a sense, we aren't reducing functionality, but we're "out-of-scoping" the feature of knowing the disambiguation level.

For example, before:
That was a 
<span its:disambigGranularity="lexicalConcept" its:disambigSource="Wordnet3.1" its:disambigIdent="good%3:00:01::">good</span> 
<span its:disambigGranularity="ontologyConcept" its:disambigIdentRef= <http://sw.opencyc.org/2012/05/10/concept/en/Game> "http://sw.opencyc.org/2012/05/10/concept/en/Game">game</span> against 
<span its:disambigGranularity="entity" its:disambigIdentRef= <http://dbpedia.org/resource/Real_Madrid> "http://dbpedia.org/resource/Real_Madrid">Madrid</span>!

is now proposed to be:
That was a 
<span its:tanSource="Wordnet3.1" its:tanIdent="good%3:00:01::">good</span>
<span its:tanIdentRef= <http://sw.opencyc.org/2012/05/10/concept/en/Game> "http://sw.opencyc.org/2012/05/10/concept/en/Game">game</span> 
against
<span tanIdentRef= <http://dbpedia.org/resource/Real_Madrid> "http://dbpedia.org/resource/Real_Madrid">Madrid</span>!

I'm not entirely sure if this is 'major' or not. 
-- Tadej

On 1/30/2013 7:16 PM, Yves Savourel wrote:

Hi all,
 

- we moved issue-69 disambiguation vs. term forward. 
My understanding from the conclusion on the call was:
* people would agree with dropping "granularity" or "qualifier"
from the data category
* people would agree with re-naming attributes and the data 
category: to use "tan" instead of "disambig", e.g. 
"tan-ident-ref" instead of "disambig-ident-ref". E.g. instead of

I wasn't sure I understood correct during the call and was waiting to see the summary.
 
So we would go back to the simple 'named entity' requirement we had originally?
Dropping completely lexical and ontology concepts.
 
I'm curious to see how we'll sell that as a non-substantive change: we're removing features. (I'm not against, just pointing that out).


We are removing an attribute and renaming others. Sure, this is a borderline case more than the others we have (e.g. regex change). But it seems so far we don't have implementations "doing" anything with the attribute. That was basically the issue with the levels: nobody had a consumption scenario for it. I saw that Yves created a representation of the disambiguation output in XLIFF - but my guess is that dropping the level and renaming the attributes wouldn't change anything wrt to further consumption - no?

With that argumentation I think the removal can be argued as not needing another last call draft. But let's see what others think.




 
 
 

* Steps needed anyway for resolving issue-67 are: re-writing 
the now "tan" section (previously "disambig"), and potentially 
rewriting / merging "Terminology". Opinions on these topics or 
volunteers, please step up.

It seems the direction we are taking is to reduce to one the types of data the 'disambig/tan' data category can annotate. Merging Terminology would be the equivalent to go back to have different types of data annotated by the same data category.


With regards to the types, see Tadejs explanation - we don't merge types, we drop them, since it is hard to foresee interop with them (and nobody consumed them anyway).

I am keen to see if people still want to merge then terminology - my guess if with dropping the levels, renaming to "tan" - we might be done. But let's see.




 Then how do we justify to drop lexical and ontology concepts? (especially since there was no comment requesting to drop them).


In the long threads on issue-67, several people brought up the topic of dropping - and Christian as the originator of issue-67 sees the dropping proposal as a step forward for resolving the issue. So I think we can argue that this goes in the right direction.

Hope that these explanations helped?

Best,

Felix
Received on Wednesday, 30 January 2013 20:13:06 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:08:26 UTC