W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > August 2012

Re: [all] Call for consensus on disambiguation - feedback integrated [ACTION-181]

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Mon, 20 Aug 2012 17:01:17 +0200
Message-ID: <5032513D.8090302@informatik.uni-leipzig.de>
To: Tadej Štajner <tadej.stajner@ijs.si>
CC: "Pablo N. Mendes" <pablomendes@gmail.com>, Felix Sasaki <fsasaki@w3.org>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "raphael.troncy@eurecom.fr" <raphael.troncy@eurecom.fr>, "Giuseppe.Rizzo@eurecom.fr" <Giuseppe.Rizzo@eurecom.fr>
Hi all,
digging to the core of the problem:

How many layers of annotations do you need? entity, dictionaryEntry, 
lexicalMeaning, pragmaticMeaning,  some other layer ... The problem is 
that the XML attribute data structure is not appropriate to handle this 
kind of information. So we really need to decide how many layers we 
need. If you were to leave this open, I would suggest:
its-disambig-type-ref-1, its-entity-type-ident-ref-1 , 
its-disambig-type-ref-2, its-entity-type-ident-ref-2, 
its-disambig-type-ref-3, its-entity-type-ident-ref-3, ....
But that is not XML-like.

So question is for how many levels/layers do we require coexistence? 
Otherwise its-disambig-type-ref would be sufficient to give the 
level/layer (even more fine grained informationm, e.g. an entity of type 
place) .

Regarding isDefinedBy : It is recommended to use it, but, of course, you 
don't go to prison, if you forget it ;) Especially with # - OWL classes, 
isDefinedBy is not necessary, as the # part is cut away for any 
retrieval request, anyhow.

All the best,
Sebastian


Am 20.08.2012 12:11, schrieb Tadej Štajner:
> Hi, Pablo,
> correct. The feedback I got was that this distinction is very 
> important, but I can't think of an example with the scenario you 
> mention. Perhaps for spans where one is contained within the other, 
> such as assigning a lexical meaning to a word, while the whole phrase 
> is an entity, for example 'agriculture' in 'Ministry of agriculture'.
>
> I think it boils down to this: could this property be reliably 
> inferred from the target itself? For instance, if someone points to 
> http://www.w3.org/2006/03/wn/wn20/instances/worsense-capital-noun-3 - 
> can we expect that is definitely a case of lexical disambiguation?
>
> -- Tadej
>
>
> On 20. 08. 2012 11:42, Pablo N. Mendes wrote:
>> Hi all,
>>
>>     I would suggest  to merge "its-entity-type-ident-ref" into
>>     "its-disambig-type-ref".
>>
>>
>> If I understand correctly this is the same proposal I made at the call?
>>
>> "<pablomendes> wrt. its:disambigType = (word | entity) can't the 
>> distinction between word and entity be inferred from entityTypeRef? 
>> e.g. wiktionary:doc is a word, dbpedia:Dog is an entity" [1]
>>
>> If so, this is the answer that Tadej gave:
>>
>> "tadej: disambiguation use cases are often used in cases where text 
>> is short and lacks context
>> ... and computational lingusitic community draw a clear distinction 
>> ebtween lexical and conceptual meaning" [1]
>>
>> Perhaps one way to test how strong is this requirement would be to 
>> think of use cases where one could assign both lexical and conceptual 
>> meaning to the same span.
>>
>> Cheers,
>> Pablo
>>
>> [1] http://www.w3.org/2012/07/26-mlw-lt-minutes.html
>>
>>
>> On Mon, Aug 20, 2012 at 11:13 AM, Felix Sasaki <fsasaki@w3.org 
>> <mailto:fsasaki@w3.org>> wrote:
>>
>>     Hi Sebastian,
>>
>>     2012/8/20 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de
>>     <mailto:hellmann@informatik.uni-leipzig.de>>
>>
>>         Hi Felix,
>>         your proposal is based on the assumption, that more data is
>>         available at these three URLs:
>>
>>         http:/nerd.eurecom.fr/ontology#Place
>>         <http://nerd.eurecom.fr/ontology#Place>
>>         http://dbpedia.org/resource/Dublin
>> http://www.w3.org/2006/03/wn/wn20/instances/worsense-capital-noun-3
>>
>>         While this assumption is ok for the Semantic Web, I am not
>>         sure about the ITS world.
>>
>>
>>
>>     You are right that in the "ITS world" one cannot be sure that more
>>     data is available. But I would argue that implementors who process
>>     links also in the ITS world very likely need to know (not
>>     automatically, but as a prerequisite for implementation ) what the
>>     URL is about. So I'd rather encourage implementors towards that
>>     "Semantic Web like" approach than defining so many attributes.
>>
>>     Feedback from the people who want to process "disambiguation"
>>     without Semantic Web processing is of course very important here.
>>
>>
>>         Furthermore, if you are attempting to minimize it, I would
>>         suggest  to merge
>>         "its-entity-type-ident-ref" into "its-disambig-type-ref". You
>>         wouldn't be limited to entity types and could use any of:
>>
>>
>>
>>     Makes sense to me, thanks for the proposal - let's see what Tadej
>>     and others say.
>>
>>     Best,
>>
>>     Felix
>>
>>
>>         - http:/nerd.eurecom.fr/ontology#Place
>>         <http://nerd.eurecom.fr/ontology#Place>
>>         - http://dbpedia.org/ontology/Place
>>         - http://www.monnet-project.eu/lemon#LexicalSense
>>         - http://www.monnet-project.eu/lemon#LexicalEntry
>>         - http://wordnet.princeton.edu/wndatamodel#NounWordSense
>>         - http://wordnet.princeton.edu/wndatamodel#Synset
>>
>>         All the best,
>>         Sebastian
>>
>>         Am 20.08.2012 09:44, schrieb Felix Sasaki:
>>
>>             Hi Sebastian, all,
>>
>>             thanks, Sebastian. From what you say in the wiki and in
>>             the previous mail,
>>             I think one could simplify things a lot.
>>
>>             The HTML example from Tadej *could* look like this:
>>
>>             <html lang="en">
>>
>>                 <head>
>>
>>                    <meta charset="utf-8" />
>>
>>                    <title>Entity: Local Test</title>
>>
>>                 </head>
>>
>>                 <body>
>>
>>                     <p><span
>>
>> its-entity-type-ident-ref="http:/nerd.eurecom.fr/ontology#Place
>>             <http://nerd.eurecom.fr/ontology#Place>"
>>
>> its-disambig-ident-ref="http://dbpedia.org/resource/Dublin">Dublin</span>
>>             is the <span
>>
>>             its-disambig-ident-ref="
>> http://www.w3.org/2006/03/wn/wn20/instances/worsense-capital-noun-3">capital</span>
>>             of Ireland.</p>
>>
>>                 </body>
>>
>>             </html>
>>
>>             That is, no explicit "resource" references for entity 
>> type and
>>             disambiguation source, and no disambig-type.
>>
>>             Also, I think one could get rid of adding this kind of
>>             information via
>>             global rules - I really don't see a use case for that.
>>
>>             Tadej, others, thoughts? Maybe Yves as one of the
>>             implementors processing
>>             the output and other have some thoughts too?
>>
>>             Best,
>>
>>             Felix
>>
>>             2012/8/17 Sebastian Hellmann
>>             <hellmann@informatik.uni-leipzig.de
>> <mailto:hellmann@informatik.uni-leipzig.de>>
>>
>>                 Dear Felix,
>>                 to solve this issue I prepared a page:
>> http://wiki.nlp2rdf.org/wiki/**DBpedia_Spotlight<http://wiki.nlp2rdf.org/wiki/DBpedia_Spotlight>
>>
>>
>>                 It is a rough draft, so there are many mistakes,
>>                 still. Once it is mature,
>>                 I will send it to the DBpedia Spotlight and Apache
>>                 Stanbol lists to get
>>                 their feedback.
>>                 Note that I don't have a problem with these properties
>>                 as XML attributes,
>>                 where they can naturally occur only once and encoding
>>                 an implicit
>>                 dependency (attribute refering to another attribute)
>>                 is unproblematic. They
>>                 are, however, difficult to handle in RDF, even when
>>                 declaring them
>>                 functional.
>>                 I will report back, if there are any news,
>>
>>                 All the best,
>>                 Sebastian
>>
>>
>>
>>
>>                 Am 14.08.2012 21:34, schrieb Felix Sasaki:
>>
>>                     Hi Sebastian, all,
>>
>>                     August is taking its tribute ... I am wondering if
>>                     there any thoughts on
>>                     Sebastian's mail below. It seems that some of the
>>                     proposed ITS attributes
>>                     are not needed, but I don't have the competence to
>>                     evaluate this. Thoughts
>>                     from others?  Sebastian, could you confirm that
>>                     the output mentioned in
>>                     this other thread
>>
>> http://lists.w3.org/Archives/**Public/public-multilingualweb-**
>> lt/2012Aug/0168.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0168.html>
>>
>>
>>
>>                     is correct for NIF? I then would create a test
>>                     case for our test suite,
>>                     see
>>
>> http://lists.w3.org/Archives/**Public/public-multilingualweb-**
>> lt-tests/2012Aug/0003.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt-tests/2012Aug/0003.html>
>>
>>
>>
>>                     Thanks,
>>
>>                     Felix
>>
>>                     Am Donnerstag, 9. August 2012 schrieb Sebastian
>>                     Hellmann :
>>
>>                       Hi Felix,
>>
>>                         below mostly my opinion on this. Nothing,
>>                         wrong with including these
>>                         properties, but they might not make sense in
>>                         RDF. If you think, that
>>                         there
>>                         are people who would really use these
>>                         properties in RDF, then go ahead
>>                         and
>>                         include them. Personally, *I* wouldn't know
>>                         for what *I* could use them.
>>                         More comments inline.
>>
>>                         Am 09.08.2012 15 <tel:09.08.2012%2015>:20,
>>                         schrieb Felix Sasaki:
>>
>>                           its:entityTypeSourceRef
>>
>>                               I really do not find this property 
>> helpful.
>>
>>                         Do you see any sense in saying that
>>                         http://dbpedia.org/resource/****
>>                         Dublin
>> <http://dbpedia.org/resource/**Dublin><http://dbpedia.org/**
>>                         resource/Dublin
>> <http://dbpedia.org/resource/Dublin>>is from
>>
>>
>>                         http://dbpedia.org ? In the linked data world
>>                         http://dbpedia.org/resource/
>>                         **Dublin
>> <http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>>
>>                         comes from
>> http://dbpedia.org/resource/****Dublin<http://dbpedia.org/resource/**Dublin><
>>
>>
>> http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>>.
>>                         So you might specify a way to convert that to
>>                         ITS, but we might not need
>>
>>                         an RDF property for this.
>>
>>                            its:disambigType
>>
>> "(http://www.w3.org/2005/11/****its/lexicalConcept|
>> <http://www.w3.org/2005/11/****its/lexicalConcept%7C><http://www.w3.org/2005/11/**its/lexicalConcept%7C>
>> <http://**www.w3.org/2005/11/its/**lexicalConcept%7C
>> <http://www.w3.org/2005/11/its/**lexicalConcept%7C><http://www.w3.org/2005/11/its/lexicalConcept%7C>
>> http://www.w3.org/2005/11/its/****ontologyConcept|http://www.**w3.**
>> <http://www.w3.org/2005/11/its/****ontologyConcept%7Chttp://www.**w3.**><http://www.w3.org/2005/11/its/**ontologyConcept%7Chttp://www.w3.**>
>> org/2005/11/its/<http://www.**w3.org/2005/11/its/**
>>                             <http://w3.org/2005/11/its/**>
>> ontologyConcept%7Chttp://www.**w3.org/2005/11/its/
>> <http://w3.org/2005/11/its/><http://www.w3.org/2005/11/its/ontologyConcept%7Chttp://www.w3.org/2005/11/its/>
>>
>>
>>                             entity)"
>>
>>                               I am unsure about this one.
>>
>>                            its:entityTypeRef
>>                         is already rdf:type, so it would be a
>>                         duplicate to have its:entityTypeRef
>>                         in RDF. For
>> http://dbpedia.org/resource/****Dublin<http://dbpedia.org/resource/**Dublin>
>> <http://dbpedia.org/**resource/Dublin<http://dbpedia.org/resource/Dublin>
>>
>>                             its:**entityTypeRef would be one of:
>>
>> http://dbpedia.org/ontology/****PopulatedPlace<http://dbpedia.org/ontology/**PopulatedPlace>
>> <http://dbpedia.**org/ontology/PopulatedPlace<http://dbpedia.org/ontology/PopulatedPlace>
>> http://dbpedia.org/ontology/****Settlement<http://dbpedia.org/ontology/**Settlement>
>> <http://dbpedia.org/**ontology/Settlement<http://dbpedia.org/ontology/Settlement>
>> http://umbel.org/umbel/rc/****PopulatedPlace<http://umbel.org/umbel/rc/**PopulatedPlace>
>> <http://umbel.**org/umbel/rc/PopulatedPlace<http://umbel.org/umbel/rc/PopulatedPlace>
>> http://dbpedia.org/ontology/****Place<http://dbpedia.org/ontology/**Place><
>>                         http://dbpedia.org/ontology/**Place
>> <http://dbpedia.org/ontology/Place>>
>> http://umbel.org/umbel/rc/****Village<http://umbel.org/umbel/rc/**Village><
>>                         http://umbel.org/umbel/rc/**Village
>> <http://umbel.org/umbel/rc/Village>>
>> http://umbel.org/umbel/rc/****Location_Underspecified<http://umbel.org/umbel/rc/**Location_Underspecified>
>> <http:/**/umbel.org/umbel/rc/Location_**Underspecified
>> <http://umbel.org/umbel/rc/Location_**Underspecified><http://umbel.org/umbel/rc/Location_Underspecified>
>>                         http://schema.org/Place
>> http://www.w3.org/2002/07/owl#****Thing<http://www.w3.org/2002/07/owl#**Thing>
>> <http://www.w3.org/**2002/07/owl#Thing<http://www.w3.org/2002/07/owl#Thing>
>> http://www.opengis.net/gml/_****Feature<http://www.opengis.net/gml/_**Feature>
>> <http://www.opengis.**net/gml/_Feature<http://www.opengis.net/gml/_Feature>
>>                         +
>>                         http:/nerd.eurecom.fr/****ontology#Place
>> <http://nerd.eurecom.fr/****ontology#Place><http://nerd.eurecom.fr/**ontology#Place>
>> <http://nerd.**eurecom.fr/ontology#Place
>> <http://eurecom.fr/ontology#Place><http://nerd.eurecom.fr/ontology#Place>
>>
>>
>>
>>                         If you have a Problem with this plurality.
>>                         Then it might be good to
>>                         include an annotation property
>>                          its:preferedEntityTypeRef
>>                         So the data is there already in RDF, the
>>                         problem is rather to find a way
>>                         to convert it back to ITS.
>>
>>                         All the best,
>>                         Sebastian
>>
>>
>>
>>                         Thanks,
>>
>>
>>                         Felix
>>
>>                         2012/8/9 Felix Sasaki <fsasaki@w3.org
>>                         <mailto:fsasaki@w3.org>>
>>
>>                            Thanks for this, Tadej, looks good. There
>>                         is just one comment I don't
>>                         see
>>                         reflected:
>>
>>                         7) A question on the data category in general
>>                         and the "rules" element:
>>                         does it make sense to make some attributes
>>                         mandatory? Currently, this
>>                         would
>>                         be valid:
>>                         <its:disambiguation
>> selector="/text/body/p[@id='****dublin']/>
>>
>>
>>
>>
>>                         It seems that still all metadata items /
>>                         attributes are optional. Is
>>                         there
>>                         a way to be more specific about what must or
>>                         must not appear together,
>>                         what
>>                         is optional etc?
>>
>>                         Best,
>>
>>                         Felix
>>
>>                         2012/8/9 Tadej Stajner <tadej.stajner@ijs.si
>>                         <mailto:tadej.stajner@ijs.si>>
>>
>>                              Hi,
>>                             thanks for the tips. I covered them, and I
>>                         agree towards removing the
>>                         local XPath, since it has very limited use.
>>                         Here is another incorporating
>>                         all these comments.
>>                         -- Tadej
>>
>>                         On 8/3/2012 1:07 PM, Felix Sasaki wrote:
>>
>>                         Hi Tadej, all,
>>
>>                             thanks a lot for this. Just a few comments
>>                         / questions:
>>
>>                             1) About "The information applies to the
>>                         textual content of the
>>                         element, including child elements and
>>                         attributes.": wouldn't it make more
>>                         sense to say that this applies to only the
>>                         content of the element? E.g.
>>                         if
>>                         you annotate the "span" element in
>>
>>                             <p>I have seen <span id="timbl"><span
>>                         class="firstame">Tim</span>
>>                         <span
>> class="lastname">Berners-Lee</****span></span>
>>                         in the olympics opening
>>
>>
>>                         ceremony</p>
>>
>>                             You want to express disambiguation
>>                         information about the "span"
>>                         element
>>                         with the "id" attribute, but not about the
>>                         "id" attribute or the nested
>>                         span elements. So inheritance probably should
>>                         be: "There is no
>>                         inheritance". What do you think?
>>
>>
>>                             2) About "The Entity data category can be
>>                         expressed with global rules,
>>                         or locally on an individual element.": This
>>                         should probably be "The
>>                         Disambiguation data category can be expressed
>>                         with global rules, or
>>                         locally
>>                         on an individual element."
>>
>>                             3) About local markup: for other data
>>                         categories, we don't have the
>>                         "pointer" attributes as local markup, since
>>                         processing of XPath in local
>>                         markup can be very expensive. So I would
>>                         propose to drop the local
>>                         pointer
>>                         attributes here too.
>>
>>                             4) In the table at the end, "Global
>>                         pointing to existing information"
>>                         should be "yes" I think.
>>
>>                             5) This selector
>>                         <its:disambiguation
>>                         selector="/text/body/p/#****dublin" ...
>>                         In XPath should be
>>                         <its:disambiguation
>>                         selector="/text/body/p[@id='****dublin']
>>
>>
>>
>>                             6) To follow the conventions from other
>>                         data categories, the
>>                         "its:disambiguation" element should probably
>>                         be called
>>                         "its:disambiguationRule".
>>
>>                             7) A question on the data category in
>>                         general and the "rules" element:
>>                         does it make sense to make some attributes
>>                         mandatory? Currently, this
>>                         would
>>                         be valid:
>>                         <its:disambiguation
>> selector="/text/body/p[@id='****dublin']/>
>>
>>
>>
>>                             8) A question to the others in this thread
>>                         (Guiseppe, Pablo, Raphael,
>>                         Sebastian): is this a representation that
>>                         makes sense to you and that
>>                         your
>>                         tools could produce?
>>
>>                             9) A question to the MT guys: is the way
>>                         "entity and disambiguation"
>>                         information is represented here useful for you?
>>
>>                             Best,
>>
>>                             Felix
>>
>>                         2012/8/3 Tadej Štajner <tadej.stajner@ijs.si
>>                         <mailto:tadej.stajner@ijs.si>>
>>
>>                            Hi,
>>                         I incorporated some comments that 'entity' was
>>                         still conflated from
>>                         several distinct things in the data category
>>                         proposal. Now, we
>>                         distinguish
>>                         between disambiguation of word sense, ontology
>>                         concept and entity
>>                         instance.
>>                         Following that, it seems that 'Disambiguation'
>>                         was the better name for
>>                         the
>>                         data category.
>>
>>                         Thanks for everyone's input!
>>
>>                         -- Tadej
>>
>>                         On 02. 08. 2012 17
>>                         <tel:02.%2008.%202012%2017>:26, Tadej Štajner
>>                         wrote:
>>
>>                            Apologies -- wrong link on the previous
>>                         mail. This is the relevant one:
>> http://www.w3.org/****International/multilingualweb/**
>> **lt/track/actions/181<http://www.w3.org/**International/multilingualweb/**lt/track/actions/181>
>> <http://**www.w3.org/International/**multilingualweb/lt/track/**
>> <http://www.w3.org/International/**multilingualweb/lt/track/**>
>>
>>
>> actions/181<http://www.w3.org/International/multilingualweb/lt/track/actions/181>
>>                         -- Tadej
>>
>>                         On 02. 08. 2012 17
>>                         <tel:02.%2008.%202012%2017>:22, Tadej Štajner
>>                         wrote:
>>
>>                         Dipl. Inf. Sebastian Hellmann
>>                         Department of Computer Science, University of
>>                         Leipzig
>>                         Events:
>>                             *
>> http://sabre2012.infai.org/****mlode<http://sabre2012.infai.org/**mlode><
>>
>>
>>                         http://sabre2012.infai.org/**mlode
>> <http://sabre2012.infai.org/mlode>>(Leipzig,
>>                         Sept. 23-24-25, 2012)
>>
>>                             * http://wole2012.eurecom.fr (*Deadline:
>>                         July 31st 2012*)
>>                         Projects: http://nlp2rdf.org , 
>> http://dbpedia.org
>>                         Homepage:
>> http://bis.informatik.uni-**le**ipzig.de/SebastianHellmann
>> <http://ipzig.de/SebastianHellmann><http://leipzig.de/SebastianHellmann>
>> <htt**p://bis.informatik.uni-**leipzig.de/SebastianHellmann
>> <http://leipzig.de/SebastianHellmann><http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>                         Research Group: http://aksw.org
>>
>>
>>
>>                 --
>>                 Dipl. Inf. Sebastian Hellmann
>>                 Department of Computer Science, University of Leipzig
>>                 Events:
>>                    * http://sabre2012.infai.org/**mlode
>> <http://sabre2012.infai.org/mlode>(Leipzig, Sept.
>>                 23-24-25, 2012)
>>                    * http://wole2012.eurecom.fr (*Deadline: July 31st
>>                 2012*)
>>                 Projects: http://nlp2rdf.org , http://dbpedia.org
>>                 Homepage:
>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann
>> <http://leipzig.de/SebastianHellmann><http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>                 Research Group: http://aksw.org
>>
>>
>>
>>
>>
>>
>>         --         Dipl. Inf. Sebastian Hellmann
>>         Department of Computer Science, University of Leipzig
>>         Events:
>>           * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25,
>>         2012)
>>           * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
>>         Projects: http://nlp2rdf.org , http://dbpedia.org
>>         Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>         Research Group: http://aksw.org
>>
>>
>>
>>
>>     --     Felix Sasaki
>>     DFKI / W3C Fellow
>>
>>
>>
>>
>> -- 
>> ---
>> Pablo N. Mendes
>> http://pablomendes.com
>> Events: http://wole2012.eurecom.fr <http://wole2012.eurecom.fr/>
>>
>
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
   * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Monday, 20 August 2012 15:01:52 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:50 UTC