- From: Tadej Stajner <tadej.stajner@ijs.si>
- Date: Thu, 30 Aug 2012 16:29:52 +0200
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- CC: "Pablo N. Mendes" <pablomendes@gmail.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "raphael.troncy@eurecom.fr" <raphael.troncy@eurecom.fr>, "Giuseppe.Rizzo@eurecom.fr" <Giuseppe.Rizzo@eurecom.fr>
- Message-ID: <503F78E0.9080401@ijs.si>
Hi, all, Co-existence of disambiguaton is not that important - I also can't justify a real use case for it. The point is more about specifying what level we're disambiguating on. I'm in favor of keeping the disambigLevel solution and not introducing a new set of attributes, trading off coexistence. I also propose a different solution for the 'disambigSource' and 'entityTypeSource' scenario, which are mostly redundant in RDF: the user can use either only a disambigIdentRef to specify a URI for the target entity, or a pair of disambigSource and disambigIdent strings in order to cover use cases, where the meanings don't have addressable URIs. Major differences: * entityType -> generalize to targetType, cover all levels; * disambigType -> rename to disambigLevel, change constants from literals to URIs. * disambigSource* -> disambigSource, restrict usage to disambiguating with non-URI identifiers * disambigIdentRef -> disambigIdentRef* for URI identifier + disambigIdent for local identifiers in the scope of a disambigSource * entityTypeSource* -> dropped -- Tadej On 8/20/2012 5:01 PM, Sebastian Hellmann wrote: > Hi all, > digging to the core of the problem: > > How many layers of annotations do you need? entity, dictionaryEntry, > lexicalMeaning, pragmaticMeaning, some other layer ... The problem is > that the XML attribute data structure is not appropriate to handle > this kind of information. So we really need to decide how many layers > we need. If you were to leave this open, I would suggest: > its-disambig-type-ref-1, its-entity-type-ident-ref-1 , > its-disambig-type-ref-2, its-entity-type-ident-ref-2, > its-disambig-type-ref-3, its-entity-type-ident-ref-3, .... > But that is not XML-like. > > So question is for how many levels/layers do we require coexistence? > Otherwise its-disambig-type-ref would be sufficient to give the > level/layer (even more fine grained informationm, e.g. an entity of > type place) . > > Regarding isDefinedBy : It is recommended to use it, but, of course, > you don't go to prison, if you forget it ;) Especially with # - OWL > classes, isDefinedBy is not necessary, as the # part is cut away for > any retrieval request, anyhow. > > All the best, > Sebastian > > > Am 20.08.2012 12:11, schrieb Tadej Štajner: >> Hi, Pablo, >> correct. The feedback I got was that this distinction is very >> important, but I can't think of an example with the scenario you >> mention. Perhaps for spans where one is contained within the other, >> such as assigning a lexical meaning to a word, while the whole phrase >> is an entity, for example 'agriculture' in 'Ministry of agriculture'. >> >> I think it boils down to this: could this property be reliably >> inferred from the target itself? For instance, if someone points to >> http://www.w3.org/2006/03/wn/wn20/instances/worsense-capital-noun-3 - >> can we expect that is definitely a case of lexical disambiguation? >> >> -- Tadej >> >> >> On 20. 08. 2012 11:42, Pablo N. Mendes wrote: >>> Hi all, >>> >>> I would suggest to merge "its-entity-type-ident-ref" into >>> "its-disambig-type-ref". >>> >>> >>> If I understand correctly this is the same proposal I made at the call? >>> >>> "<pablomendes> wrt. its:disambigType = (word | entity) can't the >>> distinction between word and entity be inferred from entityTypeRef? >>> e.g. wiktionary:doc is a word, dbpedia:Dog is an entity" [1] >>> >>> If so, this is the answer that Tadej gave: >>> >>> "tadej: disambiguation use cases are often used in cases where text >>> is short and lacks context >>> ... and computational lingusitic community draw a clear distinction >>> ebtween lexical and conceptual meaning" [1] >>> >>> Perhaps one way to test how strong is this requirement would be to >>> think of use cases where one could assign both lexical and >>> conceptual meaning to the same span. >>> >>> Cheers, >>> Pablo >>> >>> [1] http://www.w3.org/2012/07/26-mlw-lt-minutes.html >>> >>> >>> On Mon, Aug 20, 2012 at 11:13 AM, Felix Sasaki <fsasaki@w3.org >>> <mailto:fsasaki@w3.org>> wrote: >>> >>> Hi Sebastian, >>> >>> 2012/8/20 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de >>> <mailto:hellmann@informatik.uni-leipzig.de>> >>> >>> Hi Felix, >>> your proposal is based on the assumption, that more data is >>> available at these three URLs: >>> >>> http:/nerd.eurecom.fr/ontology#Place >>> <http://nerd.eurecom.fr/ontology#Place> >>> http://dbpedia.org/resource/Dublin >>> http://www.w3.org/2006/03/wn/wn20/instances/worsense-capital-noun-3 >>> >>> While this assumption is ok for the Semantic Web, I am not >>> sure about the ITS world. >>> >>> >>> >>> You are right that in the "ITS world" one cannot be sure that more >>> data is available. But I would argue that implementors who process >>> links also in the ITS world very likely need to know (not >>> automatically, but as a prerequisite for implementation ) what the >>> URL is about. So I'd rather encourage implementors towards that >>> "Semantic Web like" approach than defining so many attributes. >>> >>> Feedback from the people who want to process "disambiguation" >>> without Semantic Web processing is of course very important here. >>> >>> >>> Furthermore, if you are attempting to minimize it, I would >>> suggest to merge >>> "its-entity-type-ident-ref" into "its-disambig-type-ref". You >>> wouldn't be limited to entity types and could use any of: >>> >>> >>> >>> Makes sense to me, thanks for the proposal - let's see what Tadej >>> and others say. >>> >>> Best, >>> >>> Felix >>> >>> >>> - http:/nerd.eurecom.fr/ontology#Place >>> <http://nerd.eurecom.fr/ontology#Place> >>> - http://dbpedia.org/ontology/Place >>> - http://www.monnet-project.eu/lemon#LexicalSense >>> - http://www.monnet-project.eu/lemon#LexicalEntry >>> - http://wordnet.princeton.edu/wndatamodel#NounWordSense >>> - http://wordnet.princeton.edu/wndatamodel#Synset >>> >>> All the best, >>> Sebastian >>> >>> Am 20.08.2012 09:44, schrieb Felix Sasaki: >>> >>> Hi Sebastian, all, >>> >>> thanks, Sebastian. From what you say in the wiki and in >>> the previous mail, >>> I think one could simplify things a lot. >>> >>> The HTML example from Tadej *could* look like this: >>> >>> <html lang="en"> >>> >>> <head> >>> >>> <meta charset="utf-8" /> >>> >>> <title>Entity: Local Test</title> >>> >>> </head> >>> >>> <body> >>> >>> <p><span >>> >>> its-entity-type-ident-ref="http:/nerd.eurecom.fr/ontology#Place >>> <http://nerd.eurecom.fr/ontology#Place>" >>> >>> its-disambig-ident-ref="http://dbpedia.org/resource/Dublin">Dublin</span> >>> >>> is the <span >>> >>> its-disambig-ident-ref=" >>> http://www.w3.org/2006/03/wn/wn20/instances/worsense-capital-noun-3">capital</span> >>> >>> of Ireland.</p> >>> >>> </body> >>> >>> </html> >>> >>> That is, no explicit "resource" references for entity >>> type and >>> disambiguation source, and no disambig-type. >>> >>> Also, I think one could get rid of adding this kind of >>> information via >>> global rules - I really don't see a use case for that. >>> >>> Tadej, others, thoughts? Maybe Yves as one of the >>> implementors processing >>> the output and other have some thoughts too? >>> >>> Best, >>> >>> Felix >>> >>> 2012/8/17 Sebastian Hellmann >>> <hellmann@informatik.uni-leipzig.de >>> <mailto:hellmann@informatik.uni-leipzig.de>> >>> >>> Dear Felix, >>> to solve this issue I prepared a page: >>> http://wiki.nlp2rdf.org/wiki/**DBpedia_Spotlight<http://wiki.nlp2rdf.org/wiki/DBpedia_Spotlight> >>> >>> >>> >>> It is a rough draft, so there are many mistakes, >>> still. Once it is mature, >>> I will send it to the DBpedia Spotlight and Apache >>> Stanbol lists to get >>> their feedback. >>> Note that I don't have a problem with these properties >>> as XML attributes, >>> where they can naturally occur only once and encoding >>> an implicit >>> dependency (attribute refering to another attribute) >>> is unproblematic. They >>> are, however, difficult to handle in RDF, even when >>> declaring them >>> functional. >>> I will report back, if there are any news, >>> >>> All the best, >>> Sebastian >>> >>> >>> >>> >>> Am 14.08.2012 21:34, schrieb Felix Sasaki: >>> >>> Hi Sebastian, all, >>> >>> August is taking its tribute ... I am wondering if >>> there any thoughts on >>> Sebastian's mail below. It seems that some of the >>> proposed ITS attributes >>> are not needed, but I don't have the competence to >>> evaluate this. Thoughts >>> from others? Sebastian, could you confirm that >>> the output mentioned in >>> this other thread >>> >>> http://lists.w3.org/Archives/**Public/public-multilingualweb-** >>> lt/2012Aug/0168.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0168.html> >>> >>> >>> >>> >>> is correct for NIF? I then would create a test >>> case for our test suite, >>> see >>> >>> http://lists.w3.org/Archives/**Public/public-multilingualweb-** >>> lt-tests/2012Aug/0003.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt-tests/2012Aug/0003.html> >>> >>> >>> >>> >>> Thanks, >>> >>> Felix >>> >>> Am Donnerstag, 9. August 2012 schrieb Sebastian >>> Hellmann : >>> >>> Hi Felix, >>> >>> below mostly my opinion on this. Nothing, >>> wrong with including these >>> properties, but they might not make sense in >>> RDF. If you think, that >>> there >>> are people who would really use these >>> properties in RDF, then go ahead >>> and >>> include them. Personally, *I* wouldn't know >>> for what *I* could use them. >>> More comments inline. >>> >>> Am 09.08.2012 15 <tel:09.08.2012%2015>:20, >>> schrieb Felix Sasaki: >>> >>> its:entityTypeSourceRef >>> >>> I really do not find this property >>> helpful. >>> >>> Do you see any sense in saying that >>> http://dbpedia.org/resource/**** >>> Dublin >>> <http://dbpedia.org/resource/**Dublin><http://dbpedia.org/** >>> resource/Dublin >>> <http://dbpedia.org/resource/Dublin>>is from >>> >>> >>> http://dbpedia.org ? In the linked data world >>> http://dbpedia.org/resource/ >>> **Dublin >>> <http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>> >>> >>> comes from >>> http://dbpedia.org/resource/****Dublin<http://dbpedia.org/resource/**Dublin>< >>> >>> >>> >>> http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>>. >>> >>> So you might specify a way to convert that to >>> ITS, but we might not need >>> >>> an RDF property for this. >>> >>> its:disambigType >>> >>> "(http://www.w3.org/2005/11/****its/lexicalConcept| >>> <http://www.w3.org/2005/11/****its/lexicalConcept%7C><http://www.w3.org/2005/11/**its/lexicalConcept%7C> >>> >>> <http://**www.w3.org/2005/11/its/**lexicalConcept%7C >>> <http://www.w3.org/2005/11/its/**lexicalConcept%7C><http://www.w3.org/2005/11/its/lexicalConcept%7C> >>> >>> http://www.w3.org/2005/11/its/****ontologyConcept|http://www.**w3.** >>> <http://www.w3.org/2005/11/its/****ontologyConcept%7Chttp://www.**w3.**><http://www.w3.org/2005/11/its/**ontologyConcept%7Chttp://www.w3.**> >>> >>> org/2005/11/its/<http://www.**w3.org/2005/11/its/** >>> <http://w3.org/2005/11/its/**> >>> ontologyConcept%7Chttp://www.**w3.org/2005/11/its/ >>> <http://w3.org/2005/11/its/><http://www.w3.org/2005/11/its/ontologyConcept%7Chttp://www.w3.org/2005/11/its/> >>> >>> >>> >>> entity)" >>> >>> I am unsure about this one. >>> >>> its:entityTypeRef >>> is already rdf:type, so it would be a >>> duplicate to have its:entityTypeRef >>> in RDF. For >>> http://dbpedia.org/resource/****Dublin<http://dbpedia.org/resource/**Dublin> >>> >>> <http://dbpedia.org/**resource/Dublin<http://dbpedia.org/resource/Dublin> >>> >>> >>> its:**entityTypeRef would be one of: >>> >>> http://dbpedia.org/ontology/****PopulatedPlace<http://dbpedia.org/ontology/**PopulatedPlace> >>> >>> <http://dbpedia.**org/ontology/PopulatedPlace<http://dbpedia.org/ontology/PopulatedPlace> >>> >>> http://dbpedia.org/ontology/****Settlement<http://dbpedia.org/ontology/**Settlement> >>> >>> <http://dbpedia.org/**ontology/Settlement<http://dbpedia.org/ontology/Settlement> >>> >>> http://umbel.org/umbel/rc/****PopulatedPlace<http://umbel.org/umbel/rc/**PopulatedPlace> >>> >>> <http://umbel.**org/umbel/rc/PopulatedPlace<http://umbel.org/umbel/rc/PopulatedPlace> >>> >>> http://dbpedia.org/ontology/****Place<http://dbpedia.org/ontology/**Place>< >>> >>> http://dbpedia.org/ontology/**Place >>> <http://dbpedia.org/ontology/Place>> >>> http://umbel.org/umbel/rc/****Village<http://umbel.org/umbel/rc/**Village>< >>> >>> http://umbel.org/umbel/rc/**Village >>> <http://umbel.org/umbel/rc/Village>> >>> http://umbel.org/umbel/rc/****Location_Underspecified<http://umbel.org/umbel/rc/**Location_Underspecified> >>> >>> <http:/**/umbel.org/umbel/rc/Location_**Underspecified >>> <http://umbel.org/umbel/rc/Location_**Underspecified><http://umbel.org/umbel/rc/Location_Underspecified> >>> >>> http://schema.org/Place >>> http://www.w3.org/2002/07/owl#****Thing<http://www.w3.org/2002/07/owl#**Thing> >>> >>> <http://www.w3.org/**2002/07/owl#Thing<http://www.w3.org/2002/07/owl#Thing> >>> >>> http://www.opengis.net/gml/_****Feature<http://www.opengis.net/gml/_**Feature> >>> >>> <http://www.opengis.**net/gml/_Feature<http://www.opengis.net/gml/_Feature> >>> >>> + >>> http:/nerd.eurecom.fr/****ontology#Place >>> <http://nerd.eurecom.fr/****ontology#Place><http://nerd.eurecom.fr/**ontology#Place> >>> >>> <http://nerd.**eurecom.fr/ontology#Place >>> <http://eurecom.fr/ontology#Place><http://nerd.eurecom.fr/ontology#Place> >>> >>> >>> >>> >>> If you have a Problem with this plurality. >>> Then it might be good to >>> include an annotation property >>> its:preferedEntityTypeRef >>> So the data is there already in RDF, the >>> problem is rather to find a way >>> to convert it back to ITS. >>> >>> All the best, >>> Sebastian >>> >>> >>> >>> Thanks, >>> >>> >>> Felix >>> >>> 2012/8/9 Felix Sasaki <fsasaki@w3.org >>> <mailto:fsasaki@w3.org>> >>> >>> Thanks for this, Tadej, looks good. There >>> is just one comment I don't >>> see >>> reflected: >>> >>> 7) A question on the data category in general >>> and the "rules" element: >>> does it make sense to make some attributes >>> mandatory? Currently, this >>> would >>> be valid: >>> <its:disambiguation >>> selector="/text/body/p[@id='****dublin']/> >>> >>> >>> >>> >>> It seems that still all metadata items / >>> attributes are optional. Is >>> there >>> a way to be more specific about what must or >>> must not appear together, >>> what >>> is optional etc? >>> >>> Best, >>> >>> Felix >>> >>> 2012/8/9 Tadej Stajner <tadej.stajner@ijs.si >>> <mailto:tadej.stajner@ijs.si>> >>> >>> Hi, >>> thanks for the tips. I covered them, and I >>> agree towards removing the >>> local XPath, since it has very limited use. >>> Here is another incorporating >>> all these comments. >>> -- Tadej >>> >>> On 8/3/2012 1:07 PM, Felix Sasaki wrote: >>> >>> Hi Tadej, all, >>> >>> thanks a lot for this. Just a few comments >>> / questions: >>> >>> 1) About "The information applies to the >>> textual content of the >>> element, including child elements and >>> attributes.": wouldn't it make more >>> sense to say that this applies to only the >>> content of the element? E.g. >>> if >>> you annotate the "span" element in >>> >>> <p>I have seen <span id="timbl"><span >>> class="firstame">Tim</span> >>> <span >>> class="lastname">Berners-Lee</****span></span> >>> in the olympics opening >>> >>> >>> ceremony</p> >>> >>> You want to express disambiguation >>> information about the "span" >>> element >>> with the "id" attribute, but not about the >>> "id" attribute or the nested >>> span elements. So inheritance probably should >>> be: "There is no >>> inheritance". What do you think? >>> >>> >>> 2) About "The Entity data category can be >>> expressed with global rules, >>> or locally on an individual element.": This >>> should probably be "The >>> Disambiguation data category can be expressed >>> with global rules, or >>> locally >>> on an individual element." >>> >>> 3) About local markup: for other data >>> categories, we don't have the >>> "pointer" attributes as local markup, since >>> processing of XPath in local >>> markup can be very expensive. So I would >>> propose to drop the local >>> pointer >>> attributes here too. >>> >>> 4) In the table at the end, "Global >>> pointing to existing information" >>> should be "yes" I think. >>> >>> 5) This selector >>> <its:disambiguation >>> selector="/text/body/p/#****dublin" ... >>> In XPath should be >>> <its:disambiguation >>> selector="/text/body/p[@id='****dublin'] >>> >>> >>> >>> 6) To follow the conventions from other >>> data categories, the >>> "its:disambiguation" element should probably >>> be called >>> "its:disambiguationRule". >>> >>> 7) A question on the data category in >>> general and the "rules" element: >>> does it make sense to make some attributes >>> mandatory? Currently, this >>> would >>> be valid: >>> <its:disambiguation >>> selector="/text/body/p[@id='****dublin']/> >>> >>> >>> >>> 8) A question to the others in this thread >>> (Guiseppe, Pablo, Raphael, >>> Sebastian): is this a representation that >>> makes sense to you and that >>> your >>> tools could produce? >>> >>> 9) A question to the MT guys: is the way >>> "entity and disambiguation" >>> information is represented here useful for you? >>> >>> Best, >>> >>> Felix >>> >>> 2012/8/3 Tadej Štajner <tadej.stajner@ijs.si >>> <mailto:tadej.stajner@ijs.si>> >>> >>> Hi, >>> I incorporated some comments that 'entity' was >>> still conflated from >>> several distinct things in the data category >>> proposal. Now, we >>> distinguish >>> between disambiguation of word sense, ontology >>> concept and entity >>> instance. >>> Following that, it seems that 'Disambiguation' >>> was the better name for >>> the >>> data category. >>> >>> Thanks for everyone's input! >>> >>> -- Tadej >>> >>> On 02. 08. 2012 17 >>> <tel:02.%2008.%202012%2017>:26, Tadej Štajner >>> wrote: >>> >>> Apologies -- wrong link on the previous >>> mail. This is the relevant one: >>> http://www.w3.org/****International/multilingualweb/** >>> **lt/track/actions/181<http://www.w3.org/**International/multilingualweb/**lt/track/actions/181> >>> >>> <http://**www.w3.org/International/**multilingualweb/lt/track/** >>> <http://www.w3.org/International/**multilingualweb/lt/track/**> >>> >>> >>> actions/181<http://www.w3.org/International/multilingualweb/lt/track/actions/181> >>> >>> -- Tadej >>> >>> On 02. 08. 2012 17 >>> <tel:02.%2008.%202012%2017>:22, Tadej Štajner >>> wrote: >>> >>> Dipl. Inf. Sebastian Hellmann >>> Department of Computer Science, University of >>> Leipzig >>> Events: >>> * >>> http://sabre2012.infai.org/****mlode<http://sabre2012.infai.org/**mlode>< >>> >>> >>> >>> http://sabre2012.infai.org/**mlode >>> <http://sabre2012.infai.org/mlode>>(Leipzig, >>> Sept. 23-24-25, 2012) >>> >>> * http://wole2012.eurecom.fr (*Deadline: >>> July 31st 2012*) >>> Projects: http://nlp2rdf.org , >>> http://dbpedia.org >>> Homepage: >>> http://bis.informatik.uni-**le**ipzig.de/SebastianHellmann >>> <http://ipzig.de/SebastianHellmann><http://leipzig.de/SebastianHellmann> >>> >>> <htt**p://bis.informatik.uni-**leipzig.de/SebastianHellmann >>> <http://leipzig.de/SebastianHellmann><http://bis.informatik.uni-leipzig.de/SebastianHellmann> >>> >>> Research Group: http://aksw.org >>> >>> >>> >>> -- >>> Dipl. Inf. Sebastian Hellmann >>> Department of Computer Science, University of Leipzig >>> Events: >>> * http://sabre2012.infai.org/**mlode >>> <http://sabre2012.infai.org/mlode>(Leipzig, Sept. >>> 23-24-25, 2012) >>> * http://wole2012.eurecom.fr (*Deadline: July 31st >>> 2012*) >>> Projects: http://nlp2rdf.org , http://dbpedia.org >>> Homepage: >>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann >>> <http://leipzig.de/SebastianHellmann><http://bis.informatik.uni-leipzig.de/SebastianHellmann> >>> >>> Research Group: http://aksw.org >>> >>> >>> >>> >>> >>> >>> -- Dipl. Inf. Sebastian Hellmann >>> Department of Computer Science, University of Leipzig >>> Events: >>> * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, >>> 2012) >>> * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*) >>> Projects: http://nlp2rdf.org , http://dbpedia.org >>> Homepage: >>> http://bis.informatik.uni-leipzig.de/SebastianHellmann >>> Research Group: http://aksw.org >>> >>> >>> >>> >>> -- Felix Sasaki >>> DFKI / W3C Fellow >>> >>> >>> >>> >>> -- >>> --- >>> Pablo N. Mendes >>> http://pablomendes.com >>> Events: http://wole2012.eurecom.fr <http://wole2012.eurecom.fr/> >>> >> >> > >
Attachments
- application/x-zip-compressed attachment: disambiguation_20120830.zip
Received on Thursday, 30 August 2012 14:30:37 UTC