Re: [all] Call for consensus on disambiguation - feedback integrated [ACTION-181]

Hi Sebastian, all,

thanks, Sebastian. From what you say in the wiki and in the previous mail,
I think one could simplify things a lot.

The HTML example from Tadej *could* look like this:

<html lang="en">

   <head>

      <meta charset="utf-8" />

      <title>Entity: Local Test</title>

   </head>

   <body>

       <p><span

its-entity-type-ident-ref="http:/nerd.eurecom.fr/ontology#Place"

its-disambig-ident-ref="http://dbpedia.org/resource/Dublin">Dublin</span>
is the <span

its-disambig-ident-ref="
http://www.w3.org/2006/03/wn/wn20/instances/worsense-capital-noun-3">capital</span>
of Ireland.</p>

   </body>

</html>

That is, no explicit "resource" references for entity type and
disambiguation source, and no disambig-type.

Also, I think one could get rid of adding this kind of information via
global rules - I really don't see a use case for that.

Tadej, others, thoughts? Maybe Yves as one of the implementors processing
the output and other have some thoughts too?

Best,

Felix

2012/8/17 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>

> Dear Felix,
> to solve this issue I prepared a page:
> http://wiki.nlp2rdf.org/wiki/**DBpedia_Spotlight<http://wiki.nlp2rdf.org/wiki/DBpedia_Spotlight>
> It is a rough draft, so there are many mistakes, still. Once it is mature,
> I will send it to the DBpedia Spotlight and Apache Stanbol lists to get
> their feedback.
> Note that I don't have a problem with these properties as XML attributes,
> where they can naturally occur only once and encoding an implicit
> dependency (attribute refering to another attribute) is unproblematic. They
> are, however, difficult to handle in RDF, even when declaring them
> functional.
> I will report back, if there are any news,
>
> All the best,
> Sebastian
>
>
>
>
> Am 14.08.2012 21:34, schrieb Felix Sasaki:
>
>> Hi Sebastian, all,
>>
>> August is taking its tribute ... I am wondering if there any thoughts on
>> Sebastian's mail below. It seems that some of the proposed ITS attributes
>> are not needed, but I don't have the competence to evaluate this. Thoughts
>> from others?  Sebastian, could you confirm that the output mentioned in
>> this other thread
>>
>> http://lists.w3.org/Archives/**Public/public-multilingualweb-**
>> lt/2012Aug/0168.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0168.html>
>>
>> is correct for NIF? I then would create a test case for our test suite,
>> see
>>
>> http://lists.w3.org/Archives/**Public/public-multilingualweb-**
>> lt-tests/2012Aug/0003.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt-tests/2012Aug/0003.html>
>>
>> Thanks,
>>
>> Felix
>>
>> Am Donnerstag, 9. August 2012 schrieb Sebastian Hellmann :
>>
>>  Hi Felix,
>>> below mostly my opinion on this. Nothing, wrong with including these
>>> properties, but they might not make sense in RDF. If you think, that
>>> there
>>> are people who would really use these properties in RDF, then go ahead
>>> and
>>> include them. Personally, *I* wouldn't know for what *I* could use them.
>>> More comments inline.
>>>
>>> Am 09.08.2012 15:20, schrieb Felix Sasaki:
>>>
>>>  its:entityTypeSourceRef
>>>>
>>>>  I really do not find this property helpful.
>>> Do you see any sense in saying that http://dbpedia.org/resource/****
>>> Dublin <http://dbpedia.org/resource/**Dublin><http://dbpedia.org/**
>>> resource/Dublin <http://dbpedia.org/resource/Dublin>>is from
>>>
>>> http://dbpedia.org ? In the linked data world
>>> http://dbpedia.org/resource/
>>> **Dublin <http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>>
>>> comes from
>>> http://dbpedia.org/resource/****Dublin<http://dbpedia.org/resource/**Dublin><
>>> http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>>.
>>> So you might specify a way to convert that to ITS, but we might not need
>>>
>>> an RDF property for this.
>>>
>>>   its:disambigType
>>>
>>>> "(http://www.w3.org/2005/11/****its/lexicalConcept|<http://www.w3.org/2005/11/**its/lexicalConcept%7C>
>>>> <http://**www.w3.org/2005/11/its/**lexicalConcept%7C<http://www.w3.org/2005/11/its/lexicalConcept%7C>
>>>> >
>>>> http://www.w3.org/2005/11/its/****ontologyConcept|http://www.**w3.**<http://www.w3.org/2005/11/its/**ontologyConcept%7Chttp://www.w3.**>
>>>> org/2005/11/its/<http://www.**w3.org/2005/11/its/**
>>>> ontologyConcept%7Chttp://www.**w3.org/2005/11/its/<http://www.w3.org/2005/11/its/ontologyConcept%7Chttp://www.w3.org/2005/11/its/>
>>>> >
>>>> entity)"
>>>>
>>>>  I am unsure about this one.
>>>
>>>   its:entityTypeRef
>>> is already rdf:type, so it would be a duplicate to have its:entityTypeRef
>>> in RDF. For http://dbpedia.org/resource/****Dublin<http://dbpedia.org/resource/**Dublin>
>>> <http://dbpedia.org/**resource/Dublin<http://dbpedia.org/resource/Dublin>
>>> >its:**entityTypeRef would be one of:
>>> http://dbpedia.org/ontology/****PopulatedPlace<http://dbpedia.org/ontology/**PopulatedPlace>
>>> <http://dbpedia.**org/ontology/PopulatedPlace<http://dbpedia.org/ontology/PopulatedPlace>
>>> >
>>> http://dbpedia.org/ontology/****Settlement<http://dbpedia.org/ontology/**Settlement>
>>> <http://dbpedia.org/**ontology/Settlement<http://dbpedia.org/ontology/Settlement>
>>> >
>>> http://umbel.org/umbel/rc/****PopulatedPlace<http://umbel.org/umbel/rc/**PopulatedPlace>
>>> <http://umbel.**org/umbel/rc/PopulatedPlace<http://umbel.org/umbel/rc/PopulatedPlace>
>>> >
>>> http://dbpedia.org/ontology/****Place<http://dbpedia.org/ontology/**Place><
>>> http://dbpedia.org/ontology/**Place <http://dbpedia.org/ontology/Place>>
>>> http://umbel.org/umbel/rc/****Village<http://umbel.org/umbel/rc/**Village><
>>> http://umbel.org/umbel/rc/**Village <http://umbel.org/umbel/rc/Village>>
>>> http://umbel.org/umbel/rc/****Location_Underspecified<http://umbel.org/umbel/rc/**Location_Underspecified>
>>> <http:/**/umbel.org/umbel/rc/Location_**Underspecified<http://umbel.org/umbel/rc/Location_Underspecified>
>>> >
>>> http://schema.org/Place
>>> http://www.w3.org/2002/07/owl#****Thing<http://www.w3.org/2002/07/owl#**Thing>
>>> <http://www.w3.org/**2002/07/owl#Thing<http://www.w3.org/2002/07/owl#Thing>
>>> >
>>> http://www.opengis.net/gml/_****Feature<http://www.opengis.net/gml/_**Feature>
>>> <http://www.opengis.**net/gml/_Feature<http://www.opengis.net/gml/_Feature>
>>> >
>>> +
>>> http:/nerd.eurecom.fr/****ontology#Place<http://nerd.eurecom.fr/**ontology#Place>
>>> <http://nerd.**eurecom.fr/ontology#Place<http://nerd.eurecom.fr/ontology#Place>
>>> >
>>>
>>>
>>> If you have a Problem with this plurality. Then it might be good to
>>> include an annotation property  its:preferedEntityTypeRef
>>> So the data is there already in RDF, the problem is rather to find a way
>>> to convert it back to ITS.
>>>
>>> All the best,
>>> Sebastian
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Felix
>>>
>>> 2012/8/9 Felix Sasaki <fsasaki@w3.org>
>>>
>>>   Thanks for this, Tadej, looks good. There is just one comment I don't
>>> see
>>> reflected:
>>>
>>> 7) A question on the data category in general and the "rules" element:
>>> does it make sense to make some attributes mandatory? Currently, this
>>> would
>>> be valid:
>>> <its:disambiguation selector="/text/body/p[@id='****dublin']/>
>>>
>>>
>>>
>>> It seems that still all metadata items / attributes are optional. Is
>>> there
>>> a way to be more specific about what must or must not appear together,
>>> what
>>> is optional etc?
>>>
>>> Best,
>>>
>>> Felix
>>>
>>> 2012/8/9 Tadej Stajner <tadej.stajner@ijs.si>
>>>
>>>     Hi,
>>>    thanks for the tips. I covered them, and I agree towards removing the
>>> local XPath, since it has very limited use. Here is another incorporating
>>> all these comments.
>>> -- Tadej
>>>
>>> On 8/3/2012 1:07 PM, Felix Sasaki wrote:
>>>
>>> Hi Tadej, all,
>>>
>>>    thanks a lot for this. Just a few comments / questions:
>>>
>>>    1) About "The information applies to the textual content of the
>>> element, including child elements and attributes.": wouldn't it make more
>>> sense to say that this applies to only the content of the element? E.g.
>>> if
>>> you annotate the "span" element in
>>>
>>>    <p>I have seen <span id="timbl"><span class="firstame">Tim</span>
>>> <span
>>> class="lastname">Berners-Lee</****span></span> in the olympics opening
>>>
>>> ceremony</p>
>>>
>>>    You want to express disambiguation information about the "span"
>>> element
>>> with the "id" attribute, but not about the "id" attribute or the nested
>>> span elements. So inheritance probably should be: "There is no
>>> inheritance". What do you think?
>>>
>>>
>>>    2) About "The Entity data category can be expressed with global rules,
>>> or locally on an individual element.": This should probably be "The
>>> Disambiguation data category can be expressed with global rules, or
>>> locally
>>> on an individual element."
>>>
>>>    3) About local markup: for other data categories, we don't have the
>>> "pointer" attributes as local markup, since processing of XPath in local
>>> markup can be very expensive. So I would propose to drop the local
>>> pointer
>>> attributes here too.
>>>
>>>    4) In the table at the end, "Global pointing to existing information"
>>> should be "yes" I think.
>>>
>>>    5) This selector
>>> <its:disambiguation selector="/text/body/p/#****dublin" ...
>>> In XPath should be
>>> <its:disambiguation selector="/text/body/p[@id='****dublin']
>>>
>>>
>>>    6) To follow the conventions from other data categories, the
>>> "its:disambiguation" element should probably be called
>>> "its:disambiguationRule".
>>>
>>>    7) A question on the data category in general and the "rules" element:
>>> does it make sense to make some attributes mandatory? Currently, this
>>> would
>>> be valid:
>>> <its:disambiguation selector="/text/body/p[@id='****dublin']/>
>>>
>>>
>>>    8) A question to the others in this thread (Guiseppe, Pablo, Raphael,
>>> Sebastian): is this a representation that makes sense to you and that
>>> your
>>> tools could produce?
>>>
>>>    9) A question to the MT guys: is the way "entity and disambiguation"
>>> information is represented here useful for you?
>>>
>>>    Best,
>>>
>>>    Felix
>>>
>>> 2012/8/3 Tadej Štajner <tadej.stajner@ijs.si>
>>>
>>>   Hi,
>>> I incorporated some comments that 'entity' was still conflated from
>>> several distinct things in the data category proposal. Now, we
>>> distinguish
>>> between disambiguation of word sense, ontology concept and entity
>>> instance.
>>> Following that, it seems that 'Disambiguation' was the better name for
>>> the
>>> data category.
>>>
>>> Thanks for everyone's input!
>>>
>>> -- Tadej
>>>
>>> On 02. 08. 2012 17:26, Tadej Štajner wrote:
>>>
>>>   Apologies -- wrong link on the previous mail. This is the relevant one:
>>> http://www.w3.org/****International/multilingualweb/**
>>> **lt/track/actions/181<http://www.w3.org/**International/multilingualweb/**lt/track/actions/181>
>>> <http://**www.w3.org/International/**multilingualweb/lt/track/**
>>> actions/181<http://www.w3.org/International/multilingualweb/lt/track/actions/181>
>>> >
>>>
>>> -- Tadej
>>>
>>> On 02. 08. 2012 17:22, Tadej Štajner wrote:
>>>
>>> Dipl. Inf. Sebastian Hellmann
>>> Department of Computer Science, University of Leipzig
>>> Events:
>>>    * http://sabre2012.infai.org/****mlode<http://sabre2012.infai.org/**mlode><
>>> http://sabre2012.infai.org/**mlode <http://sabre2012.infai.org/mlode>>(Leipzig,
>>> Sept. 23-24-25, 2012)
>>>
>>>    * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
>>> Projects: http://nlp2rdf.org , http://dbpedia.org
>>> Homepage: http://bis.informatik.uni-**le**ipzig.de/SebastianHellmann<http://leipzig.de/SebastianHellmann>
>>> <htt**p://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>> >
>>> Research Group: http://aksw.org
>>>
>>>
>>>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Events:
>   * http://sabre2012.infai.org/**mlode <http://sabre2012.infai.org/mlode>(Leipzig, Sept. 23-24-25, 2012)
>   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
> Projects: http://nlp2rdf.org , http://dbpedia.org
> Homepage: http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
> Research Group: http://aksw.org
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Monday, 20 August 2012 07:44:44 UTC