Re: [bpmlod] Guidelines for converting BabelNet as Linguistic Linked Data from Sebastian Hellmann on 2014-05-26 (public-bpmlod@w3.org from May 2014)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Mon, 26 May 2014 20:51:29 +0200
To: Dave Lewis <dave.lewis@cs.tcd.ie>, Jorge Gracia <jgracia@fi.upm.es>, Felix Sasaki <fsasaki@w3.org>, Ciro Baron <cirobaroneto@gmail.com>
CC: Roberto Navigli <navigli@di.uniroma1.it>, Tiziano Flati <tiziano.flati@gmail.com>, lider <lider@delicias.dia.fi.upm.es>, "public-bpmlod@w3.org" <public-bpmlod@w3.org>
Message-ID: <8ccf28b0-116f-423b-8ffb-fa06be582549@email.android.com>
Hi all,
both models have different use cases and granularity.  So linking is better. If we merge, we might loose some of the use cases and there might be a conceptual gap. ITS has a coarser granularity and can serve as an entry point. It needs less know-how to be understood in my opinion. 


I asked Ciro to start investigating the ITS Ontology and copy the W3C spec text into the RDF file to improve documentation. 
(He just started out, so he will need some time to understand everything) He will also try to build proper links between NIF, ITS, LEMON and MARL and document usage.

All the best,
Sebastian


On 26 May 2014 4:21:01 PM GMT+02:00, Dave Lewis <dave.lewis@cs.tcd.ie> wrote:
>Hi Felix, Jorge,
>I'd agree that ontolex could make good use of the ITS ontology
>
>However, I'd be wary about merging them, as there are potential uses of
>
>the ITS ontology that don't immediately involve the features of
>ontolex, 
>e.g. in localisation and internationalsiation use cases without heavy 
>terminology management.
>
>A linking rather than a merge means those people could make use of ITS 
>in RDF without having to get to grips with ontolex up front.
>
>I think also that the translation model in ontolex could be aligned to 
>ITS a bit better, more on that in a bit.
>
>regards,
>Dave
>
>On 23/05/2014 18:42, Jorge Gracia wrote:
>> Hi Felix,
>>
>> Yes, I think that exploring the commonalities of both models makes a 
>> lot of sense. Not sure if they have to be merged , but I have the 
>> feeling that our lemon module could largely reuse ITS for some
>things. 
>> At the ontolex group we will treat the variation/translation module 
>> again at some point, I think. That would be a good opportunity to 
>> explore the role of ITS. I will keep you updated!
>>
>> Regards,
>> Jorge
>>
>>
>> 2014-05-23 18:49 GMT+02:00 Felix Sasaki <fsasaki@w3.org 
>> <mailto:fsasaki@w3.org>>:
>>
>>     Hi Jorge and all,
>>
>>     would it make sense to ask the ontolex group and the ITS IG to
>>     merge their models? Otherwise there would be a confusing
>>     situation: two models for the same purpose.
>>
>>     The issues are probably details. I saw e.g. in the paper that
>>     there is a translationConfidence OW property. It looks similar to
>>     mtConfidence in ITS, but there are details ideally to merge like
>>     what data type to use, whether to require relating confidence
>>     value to information about translation tools (because auto
>>     generated values cannot be interpreted without) etc.
>>
>>     Best,
>>
>>     Felix
>>
>>     Am 23.05.2014 um 15:48 schrieb Jorge Gracia <jgracia@fi.upm.es
>>     <mailto:jgracia@fi.upm.es>>:
>>
>>>     Dear Tiziano, Roberto
>>>
>>>     You could also consider using the lemon translation module to
>>>     represent explicit translations as linked data. This is
>currently
>>>     under development in the ONTOLEX group but there is a
>lemon-based
>>>     version already available, that I will present at LREC next week
>>>     [1]. The idea is reifying the translation relation so you can
>>>     attach additional information to it (source, target, confidence,
>>>     provenance, etc.) [2]
>>>
>>>     Regards,
>>>
>>>     Jorge
>>>
>>>     [1]
>>>    
>http://ra.cps.unizar.es:8080/PUBLICATIONS/attachedFiles/document/LREC2014_translations_V11.pdf
>>>     [2] http://purl.org/net/translation#
>>>
>>>
>>>
>>>
>>>     2014-05-23 11:58 GMT+02:00 Dave Lewis <dave.lewis@cs.tcd.ie
>>>     <mailto:dave.lewis@cs.tcd.ie>>:
>>>
>>>         Roberto, Tiziano,
>>>         Thanks for that.
>>>
>>>         Have you considered already how you might allow third
>parties
>>>         to QA and perhaps correct those translations? That is, some
>>>         sort of process by which proposed MT translations between
>>>         senses can be promoted to more authoritative, human checked
>>>         translations, and marked as such?
>>>
>>>         The ITS text analytics and/or terminology data categories,
>>>         which also have confidence scores could be useful for
>>>         annotating such a process:
>>>         http://www.w3.org/TR/its20/#textanalysis
>>>         http://www.w3.org/TR/its20/#terminology
>>>
>>>         To enable such checking and progression in the
>>>         authoritativeness of senses in different languages, it is
>>>         important that you record what senses are a translation of
>>>         what other senses.
>>>
>>>         In relation to the senses that are extracted from Wikipedia
>>>         interlanguage links. Do you consider those 'translations',
>>>         and in particular can you tell from those which is the
>source
>>>         and which is the target?
>>>
>>>         Interested to hear what you think.
>>>
>>>         cheers,
>>>         Dave
>>>
>>>
>>>
>>>         On 22/05/2014 17:41, Roberto Navigli wrote:
>>>>         Thanks Felix! To answer Dave's comment: translations come
>>>>         from the automatic translations of semantically annotated
>>>>         corpora, as Tiziano said, and we have a confidence for each
>>>>         of these translations together with the source of the
>>>>         original text.
>>>>
>>>>         Best,
>>>>         Roberto
>>>>
>>>>
>>>>         2014-05-22 18:35 GMT+02:00 Tiziano Flati
>>>>         <tiziano.flati@gmail.com <mailto:tiziano.flati@gmail.com>>:
>>>>
>>>>             @Felix:
>>>>
>>>>                 I am wondering if ITS 2.0 properties could help
>>>>                 here, see
>>>>                
>https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>>>                 There is mtConfidence which provides the confidence
>>>>                 value for machine translation and
>>>>                 mtConfidenceAnnotatorsRef  to identify the tool
>used.
>>>>                 Also, there is provenance related properties,
>>>>                 starting at  :org, until :revToolRef, that could
>>>>                 identify the provenance information you need. The
>>>>                 underlying definitions for the two ITS data
>>>>                 categories are at
>>>>                 http://www.w3.org/TR/its20/#provenance
>>>>                 http://www.w3.org/TR/its20/#mtconfidence
>>>>
>>>>             Yes, I think that the ITS 2.0 can definitely be a very
>>>>             good point to explore. At the moment I don't think we
>>>>             need modelling properties more complex than those ones
>>>>             (such as mtConfidenceRule, etc.), so I think this fits
>>>>             well our needs.
>>>>
>>>>             @Lewis:
>>>>
>>>>                 Do you know currently the provenance of the
>>>>                 translation between senses in babelNet. Have you
>>>>                 produced any of the translations yourself, or to
>you
>>>>                 just take the links where they are present in the
>>>>                 source resources, e.g. DBpedia.
>>>>                 What is the policy in Babelnet, is some translation
>>>>                 better than none, or is there a translation
>>>>                 confidence threshold, e.g. based on human checking,
>>>>                 Mt confidence or logical inference etc that you
>employ?
>>>>
>>>>             BabelNet translations can come from explicit resource
>>>>             information (e.g., Wikipedia interlanguage links) or as
>>>>             automatic translations supported by millions of
>>>>             sense-tagged sentences coming from Wikipedia and
>Semcor.
>>>>             In conclusion, AFAIK, BabelNet *does have* translation
>>>>             quality estimation, so I think that indication about
>>>>             confidence could be also provided. (Roberto, correct me
>>>>             if I am wrong)
>>>>
>>>>             Thank you all for your comments and suggestions :)
>>>>             Tiziano
>>>>
>>>>             2014-05-22 16:07 GMT+02:00 Dave Lewis
>>>>             <dave.lewis@cs.tcd.ie <mailto:dave.lewis@cs.tcd.ie>>:
>>>>
>>>>                 Hi Tiziano, Roberto,
>>>>                 Do you know currently the provenance of the
>>>>                 translation between senses in babelNet. Have you
>>>>                 produced any of the translations yourself, or to
>you
>>>>                 just take the links where they are present in the
>>>>                 source resources, e.g. DBpedia.
>>>>
>>>>                 In a localization or MT application we look at in
>>>>                 CNGL and FALCON, where we may use translation to 
>>>>                 guide translators or help train MT engines, the
>>>>                 provenance is important so some policies can be
>>>>                 applied to reduce the propagation of inaccurate
>>>>                 translation, or translation that are not
>appropriate
>>>>                 to the context at hand - so those ITS attributes
>are
>>>>                 really important there. To thins extend, when
>>>>                 representing this as linked data, we define
>>>>                 'wasTranslatedFrom' as a property of
>>>>                 'prov:wasDerivedFrom' to reify other provenance
>>>>                 meta-data - agents, tools, context etc.
>>>>
>>>>                 What is the policy in Babelnet, is some translation
>>>>                 better than none, or is there a translation
>>>>                 confidence threshold, e.g. based on human checking,
>>>>                 Mt confidence or logical inference etc that you
>employ?
>>>>
>>>>                 many thanks,
>>>>                 Dave
>>>>
>>>>
>>>>                 On 22/05/2014 10:42, Felix Sasaki wrote:
>>>>>                 Hi Titziano,
>>>>>
>>>>>                 sorry that I could not make the call due to
>>>>>                 personal reasons.
>>>>>
>>>>>                 In the draft I saw under „translation“ this issue:
>>>>>
>>>>>                 „Issues: Information about translation confidence
>>>>>                 (was it humanly or automatically produced? if
>>>>>                 automatic, with what confidence score?) and
>>>>>                 translation provenance (what text(s) does the
>>>>>                 translation come from? who translated and with
>what
>>>>>                 tool?).
>>>>>                 Another issue concerns whether the
>>>>>                 relation lexinfo:translation is essential or not:
>>>>>                 every sense in a language within a BabelSynset is,
>>>>>                 in fact, a translation of any other sense
>>>>>                 in another language, so that this information
>could
>>>>>                 actually be derived (problem of redundancy).
>>>>>                 However, having data linked one to each other
>could
>>>>>                 also be a benefit, since the information is
>>>>>                 explicit in the resource.“
>>>>>
>>>>>                 I am wondering if ITS 2.0 properties could help
>>>>>                 here, see
>>>>>
>>>>>                
>https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>>>>
>>>>>                 There is mtConfidence which provides the
>confidence
>>>>>                 value for machine translation and
>>>>>                 mtConfidenceAnnotatorsRef  to identify the tool
>used.
>>>>>
>>>>>                 Also, there is provenance related properties,
>>>>>                 starting at  :org, until :revToolRef, that could
>>>>>                 identify the provenance information you need. The
>>>>>                 underlying definitions for the two ITS data
>>>>>                 categories are at
>>>>>                 http://www.w3.org/TR/its20/#provenance
>>>>>                 http://www.w3.org/TR/its20/#mtconfidence
>>>>>
>>>>>                 Best,
>>>>>
>>>>>                 Felix
>>>>>
>>>>>                 Am 22.05.2014 um 10:12 schrieb Tiziano Flati
>>>>>                 <tiziano.flati@gmail.com
>>>>>                 <mailto:tiziano.flati@gmail.com>>:
>>>>>
>>>>>>                 Dear all,
>>>>>>
>>>>>>                 we have compiled a first draft of guidelines for
>>>>>>                 the conversion of BabelNet as Linguistic Linked
>>>>>>                 Data. The initial draft is here
>>>>>>                
><https://docs.google.com/document/d/184C_AjY7_PYBSc8SnAFghGLyTo1v312N34dsP9QZokI/edit#>.
>>>>>>
>>>>>>                 We can probably integrate this into the BPMLOD
>>>>>>                 community report both as a separate document and
>>>>>>                 in the form of all our resource-dependent and
>>>>>>                 independent details/comments.
>>>>>>                 Any feedback and comment is also very appreciated
>>>>>>                 and will help us improving the draft.
>>>>>>
>>>>>>                 Best regards,
>>>>>>                 Tiziano Flati and Roberto Navigli
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         -- 
>>>>         =====================================
>>>>         Roberto Navigli
>>>>         Dipartimento di Informatica
>>>>         Sapienza University of Rome
>>>>         Viale Regina Elena 295 (second floor)
>>>>         00161 Roma Italy
>>>>         Phone: +39 0649255161 <tel:%2B39%200649255161> - Fax: +39
>06
>>>>         8541842 <tel:%2B39%2006%208541842>
>>>>         Home Page: http://wwwusers.di.uniroma1.it/~navigli
>>>>         <http://wwwusers.di.uniroma1.it/%7Enavigli>
>>>>         =====================================
>>>
>>>
>>>
>>>
>>>     -- 
>>>     Jorge Gracia, PhD
>>>     Ontology Engineering Group
>>>     Artificial Intelligence Department
>>>     Universidad Politécnica de Madrid
>>>     http://delicias.dia.fi.upm.es/~jgracia/
>>>     <http://delicias.dia.fi.upm.es/%7Ejgracia/>
>>
>>
>>
>>
>> -- 
>> Jorge Gracia, PhD
>> Ontology Engineering Group
>> Artificial Intelligence Department
>> Universidad Politécnica de Madrid
>> http://delicias.dia.fi.upm.es/~jgracia/ 
>> <http://delicias.dia.fi.upm.es/%7Ejgracia/>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Received on Monday, 26 May 2014 18:52:06 UTC