W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > January 2013

RE: Issue-67 - Term + Disambiguation

From: Mārcis Pinnis <marcis.pinnis@Tilde.lv>
Date: Wed, 30 Jan 2013 22:32:25 +0200
To: Felix Sasaki <fsasaki@w3.org>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Message-ID: <AC6FD4BB9BB02540AC7322091A6C3B5472B0F01366@postal.Tilde.lv>
Hi Felix, all,

I think that by Issue-67 (and Issue-69 in the minutes) the Issue-68 is meant, right?!
There is some numbering confusion right now.

Issue-67 says: „ISSUE-67: Change definition of regular expression for allowed characters”
Issue-68 says: „ISSUE-68: Disambiguation (and term)”
Issue-69 says: „ISSUE-69: recursive nesting of external rules”

Best regards,
Mārcis ;o)

From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Wednesday, January 30, 2013 9:31 PM
To: public-multilingualweb-lt@w3.org
Subject: Re: Issue-67 - Term + Disambiguation

Hi Yves, Tadej, all,

Am 30.01.13 19:56, schrieb Tadej Stajner:
Hi, Yves,
no, this doesn't mean that we're only supporting named entity from now on. We still allow people to annotate with whatever type of text analysis they choose to do (lexical, ontology, etc. ), but we don't care about the level of the analysis. They're now all various values of its:tanIdent*.  in a sense, we aren't reducing functionality, but we're "out-of-scoping" the feature of knowing the disambiguation level.

For example, before:
That was a
<span its:disambigGranularity="lexicalConcept" its:disambigSource="Wordnet3.1" its:disambigIdent="good%3:00:01::">good</span>
<span its:disambigGranularity="ontologyConcept" its:disambigIdentRef="http://sw.opencyc.org/2012/05/10/concept/en/Game"<http://sw.opencyc.org/2012/05/10/concept/en/Game>>game</span> against
<span its:disambigGranularity="entity" its:disambigIdentRef="http://dbpedia.org/resource/Real_Madrid"<http://dbpedia.org/resource/Real_Madrid>>Madrid</span>!

is now proposed to be:
That was a
<span its:tanSource="Wordnet3.1" its:tanIdent="good%3:00:01::">good</span>
<span its:tanIdentRef="http://sw.opencyc.org/2012/05/10/concept/en/Game"<http://sw.opencyc.org/2012/05/10/concept/en/Game>>game</span>
<span tanIdentRef="http://dbpedia.org/resource/Real_Madrid"<http://dbpedia.org/resource/Real_Madrid>>Madrid</span>!

I'm not entirely sure if this is 'major' or not.
-- Tadej

On 1/30/2013 7:16 PM, Yves Savourel wrote:

Hi all,

- we moved issue-69 disambiguation vs. term forward.

My understanding from the conclusion on the call was:

* people would agree with dropping "granularity" or "qualifier"

from the data category

* people would agree with re-naming attributes and the data

category: to use "tan" instead of "disambig", e.g.

"tan-ident-ref" instead of "disambig-ident-ref". E.g. instead of

I wasn't sure I understood correct during the call and was waiting to see the summary.

So we would go back to the simple 'named entity' requirement we had originally?

Dropping completely lexical and ontology concepts.

I'm curious to see how we'll sell that as a non-substantive change: we're removing features. (I'm not against, just pointing that out).

We are removing an attribute and renaming others. Sure, this is a borderline case more than the others we have (e.g. regex change). But it seems so far we don't have implementations "doing" anything with the attribute. That was basically the issue with the levels: nobody had a consumption scenario for it. I saw that Yves created a representation of the disambiguation output in XLIFF - but my guess is that dropping the level and renaming the attributes wouldn't change anything wrt to further consumption - no?

With that argumentation I think the removal can be argued as not needing another last call draft. But let's see what others think.

* Steps needed anyway for resolving issue-67 are: re-writing

the now "tan" section (previously "disambig"), and potentially

rewriting / merging "Terminology". Opinions on these topics or

volunteers, please step up.

It seems the direction we are taking is to reduce to one the types of data the 'disambig/tan' data category can annotate. Merging Terminology would be the equivalent to go back to have different types of data annotated by the same data category.

With regards to the types, see Tadejs explanation - we don't merge types, we drop them, since it is hard to foresee interop with them (and nobody consumed them anyway).

I am keen to see if people still want to merge then terminology - my guess if with dropping the levels, renaming to "tan" - we might be done. But let's see.

 Then how do we justify to drop lexical and ontology concepts? (especially since there was no comment requesting to drop them).

In the long threads on issue-67, several people brought up the topic of dropping - and Christian as the originator of issue-67 sees the dropping proposal as a step forward for resolving the issue. So I think we can argue that this goes in the right direction.

Hope that these explanations helped?


Received on Wednesday, 30 January 2013 20:32:56 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:08:26 UTC