Re: [bpmlod] Guidelines for converting BabelNet as Linguistic Linked Data from Philipp Cimiano on 2014-05-24 (public-bpmlod@w3.org from May 2014)

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Sat, 24 May 2014 08:29:49 +0200
To: Jorge Gracia <jgracia@fi.upm.es>, Felix Sasaki <fsasaki@w3.org>
CC: Dave Lewis <dave.lewis@cs.tcd.ie>, Roberto Navigli <navigli@di.uniroma1.it>, Tiziano Flati <tiziano.flati@gmail.com>, lider <lider@delicias.dia.fi.upm.es>, "public-bpmlod@w3.org" <public-bpmlod@w3.org>
Message-ID: <53803C5D.9050800@cit-ec.uni-bielefeld.de>
Hi Felix,

  do you have a pointer to the ITS spec?

Best regards
,
Philipp.

Am 23.05.14 19:42, schrieb Jorge Gracia:
> Hi Felix,
>
> Yes, I think that exploring the commonalities of both models makes a 
> lot of sense. Not sure if they have to be merged , but I have the 
> feeling that our lemon module could largely reuse ITS for some things. 
> At the ontolex group we will treat the variation/translation module 
> again at some point, I think. That would be a good opportunity to 
> explore the role of ITS. I will keep you updated!
>
> Regards,
> Jorge
>
>
> 2014-05-23 18:49 GMT+02:00 Felix Sasaki <fsasaki@w3.org 
> <mailto:fsasaki@w3.org>>:
>
>     Hi Jorge and all,
>
>     would it make sense to ask the ontolex group and the ITS IG to
>     merge their models? Otherwise there would be a confusing
>     situation: two models for the same purpose.
>
>     The issues are probably details. I saw e.g. in the paper that
>     there is a translationConfidence OW property. It looks similar to
>     mtConfidence in ITS, but there are details ideally to merge like
>     what data type to use, whether to require relating confidence
>     value to information about translation tools (because auto
>     generated values cannot be interpreted without) etc.
>
>     Best,
>
>     Felix
>
>     Am 23.05.2014 um 15:48 schrieb Jorge Gracia <jgracia@fi.upm.es
>     <mailto:jgracia@fi.upm.es>>:
>
>>     Dear Tiziano, Roberto
>>
>>     You could also consider using the lemon translation module to
>>     represent explicit translations as linked data. This is currently
>>     under development in the ONTOLEX group but there is a lemon-based
>>     version already available, that I will present at LREC next week
>>     [1]. The idea is reifying the translation relation so you can
>>     attach additional information to it (source, target, confidence,
>>     provenance, etc.) [2]
>>
>>     Regards,
>>
>>     Jorge
>>
>>     [1]
>>     http://ra.cps.unizar.es:8080/PUBLICATIONS/attachedFiles/document/LREC2014_translations_V11.pdf
>>     [2] http://purl.org/net/translation#
>>
>>
>>
>>
>>     2014-05-23 11:58 GMT+02:00 Dave Lewis <dave.lewis@cs.tcd.ie
>>     <mailto:dave.lewis@cs.tcd.ie>>:
>>
>>         Roberto, Tiziano,
>>         Thanks for that.
>>
>>         Have you considered already how you might allow third parties
>>         to QA and perhaps correct those translations? That is, some
>>         sort of process by which proposed MT translations between
>>         senses can be promoted to more authoritative, human checked
>>         translations, and marked as such?
>>
>>         The ITS text analytics and/or terminology data categories,
>>         which also have confidence scores could be useful for
>>         annotating such a process:
>>         http://www.w3.org/TR/its20/#textanalysis
>>         http://www.w3.org/TR/its20/#terminology
>>
>>         To enable such checking and progression in the
>>         authoritativeness of senses in different languages, it is
>>         important that you record what senses are a translation of
>>         what other senses.
>>
>>         In relation to the senses that are extracted from Wikipedia
>>         interlanguage links. Do you consider those 'translations',
>>         and in particular can you tell from those which is the source
>>         and which is the target?
>>
>>         Interested to hear what you think.
>>
>>         cheers,
>>         Dave
>>
>>
>>
>>         On 22/05/2014 17:41, Roberto Navigli wrote:
>>>         Thanks Felix! To answer Dave's comment: translations come
>>>         from the automatic translations of semantically annotated
>>>         corpora, as Tiziano said, and we have a confidence for each
>>>         of these translations together with the source of the
>>>         original text.
>>>
>>>         Best,
>>>         Roberto
>>>
>>>
>>>         2014-05-22 18:35 GMT+02:00 Tiziano Flati
>>>         <tiziano.flati@gmail.com <mailto:tiziano.flati@gmail.com>>:
>>>
>>>             @Felix:
>>>
>>>                 I am wondering if ITS 2.0 properties could help
>>>                 here, see
>>>                 https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>>                 There is mtConfidence which provides the confidence
>>>                 value for machine translation and
>>>                 mtConfidenceAnnotatorsRef  to identify the tool used.
>>>                 Also, there is provenance related properties,
>>>                 starting at  :org, until :revToolRef, that could
>>>                 identify the provenance information you need. The
>>>                 underlying definitions for the two ITS data
>>>                 categories are at
>>>                 http://www.w3.org/TR/its20/#provenance
>>>                 http://www.w3.org/TR/its20/#mtconfidence
>>>
>>>             Yes, I think that the ITS 2.0 can definitely be a very
>>>             good point to explore. At the moment I don't think we
>>>             need modelling properties more complex than those ones
>>>             (such as mtConfidenceRule, etc.), so I think this fits
>>>             well our needs.
>>>
>>>             @Lewis:
>>>
>>>                 Do you know currently the provenance of the
>>>                 translation between senses in babelNet. Have you
>>>                 produced any of the translations yourself, or to you
>>>                 just take the links where they are present in the
>>>                 source resources, e.g. DBpedia.
>>>                 What is the policy in Babelnet, is some translation
>>>                 better than none, or is there a translation
>>>                 confidence threshold, e.g. based on human checking,
>>>                 Mt confidence or logical inference etc that you employ?
>>>
>>>             BabelNet translations can come from explicit resource
>>>             information (e.g., Wikipedia interlanguage links) or as
>>>             automatic translations supported by millions of
>>>             sense-tagged sentences coming from Wikipedia and Semcor.
>>>             In conclusion, AFAIK, BabelNet *does have* translation
>>>             quality estimation, so I think that indication about
>>>             confidence could be also provided. (Roberto, correct me
>>>             if I am wrong)
>>>
>>>             Thank you all for your comments and suggestions :)
>>>             Tiziano
>>>
>>>             2014-05-22 16:07 GMT+02:00 Dave Lewis
>>>             <dave.lewis@cs.tcd.ie <mailto:dave.lewis@cs.tcd.ie>>:
>>>
>>>                 Hi Tiziano, Roberto,
>>>                 Do you know currently the provenance of the
>>>                 translation between senses in babelNet. Have you
>>>                 produced any of the translations yourself, or to you
>>>                 just take the links where they are present in the
>>>                 source resources, e.g. DBpedia.
>>>
>>>                 In a localization or MT application we look at in
>>>                 CNGL and FALCON, where we may use translation to 
>>>                 guide translators or help train MT engines, the
>>>                 provenance is important so some policies can be
>>>                 applied to reduce the propagation of inaccurate
>>>                 translation, or translation that are not appropriate
>>>                 to the context at hand - so those ITS attributes are
>>>                 really important there. To thins extend, when
>>>                 representing this as linked data, we define
>>>                 'wasTranslatedFrom' as a property of
>>>                 'prov:wasDerivedFrom' to reify other provenance
>>>                 meta-data - agents, tools, context etc.
>>>
>>>                 What is the policy in Babelnet, is some translation
>>>                 better than none, or is there a translation
>>>                 confidence threshold, e.g. based on human checking,
>>>                 Mt confidence or logical inference etc that you employ?
>>>
>>>                 many thanks,
>>>                 Dave
>>>
>>>
>>>                 On 22/05/2014 10:42, Felix Sasaki wrote:
>>>>                 Hi Titziano,
>>>>
>>>>                 sorry that I could not make the call due to
>>>>                 personal reasons.
>>>>
>>>>                 In the draft I saw under „translation“ this issue:
>>>>
>>>>                 „Issues: Information about translation confidence
>>>>                 (was it humanly or automatically produced? if
>>>>                 automatic, with what confidence score?) and
>>>>                 translation provenance (what text(s) does the
>>>>                 translation come from? who translated and with what
>>>>                 tool?).
>>>>                 Another issue concerns whether the
>>>>                 relation lexinfo:translation is essential or not:
>>>>                 every sense in a language within a BabelSynset is,
>>>>                 in fact, a translation of any other sense
>>>>                 in another language, so that this information could
>>>>                 actually be derived (problem of redundancy).
>>>>                 However, having data linked one to each other could
>>>>                 also be a benefit, since the information is
>>>>                 explicit in the resource.“
>>>>
>>>>                 I am wondering if ITS 2.0 properties could help
>>>>                 here, see
>>>>
>>>>                 https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>>>
>>>>                 There is mtConfidence which provides the confidence
>>>>                 value for machine translation and
>>>>                 mtConfidenceAnnotatorsRef  to identify the tool used.
>>>>
>>>>                 Also, there is provenance related properties,
>>>>                 starting at  :org, until :revToolRef, that could
>>>>                 identify the provenance information you need. The
>>>>                 underlying definitions for the two ITS data
>>>>                 categories are at
>>>>                 http://www.w3.org/TR/its20/#provenance
>>>>                 http://www.w3.org/TR/its20/#mtconfidence
>>>>
>>>>                 Best,
>>>>
>>>>                 Felix
>>>>
>>>>                 Am 22.05.2014 um 10:12 schrieb Tiziano Flati
>>>>                 <tiziano.flati@gmail.com
>>>>                 <mailto:tiziano.flati@gmail.com>>:
>>>>
>>>>>                 Dear all,
>>>>>
>>>>>                 we have compiled a first draft of guidelines for
>>>>>                 the conversion of BabelNet as Linguistic Linked
>>>>>                 Data. The initial draft is here
>>>>>                 <https://docs.google.com/document/d/184C_AjY7_PYBSc8SnAFghGLyTo1v312N34dsP9QZokI/edit#>.
>>>>>
>>>>>                 We can probably integrate this into the BPMLOD
>>>>>                 community report both as a separate document and
>>>>>                 in the form of all our resource-dependent and
>>>>>                 independent details/comments.
>>>>>                 Any feedback and comment is also very appreciated
>>>>>                 and will help us improving the draft.
>>>>>
>>>>>                 Best regards,
>>>>>                 Tiziano Flati and Roberto Navigli
>>>>
>>>
>>>
>>>
>>>
>>>
>>>         -- 
>>>         =====================================
>>>         Roberto Navigli
>>>         Dipartimento di Informatica
>>>         Sapienza University of Rome
>>>         Viale Regina Elena 295 (second floor)
>>>         00161 Roma Italy
>>>         Phone: +39 0649255161 <tel:%2B39%200649255161> - Fax: +39 06
>>>         8541842 <tel:%2B39%2006%208541842>
>>>         Home Page: http://wwwusers.di.uniroma1.it/~navigli
>>>         <http://wwwusers.di.uniroma1.it/%7Enavigli>
>>>         =====================================
>>
>>
>>
>>
>>     -- 
>>     Jorge Gracia, PhD
>>     Ontology Engineering Group
>>     Artificial Intelligence Department
>>     Universidad Politécnica de Madrid
>>     http://delicias.dia.fi.upm.es/~jgracia/
>>     <http://delicias.dia.fi.upm.es/%7Ejgracia/>
>
>
>
>
> -- 
> Jorge Gracia, PhD
> Ontology Engineering Group
> Artificial Intelligence Department
> Universidad Politécnica de Madrid
> http://delicias.dia.fi.upm.es/~jgracia/ 
> <http://delicias.dia.fi.upm.es/%7Ejgracia/>


-- 

Prof. Dr. Philipp Cimiano

Phone: +49 521 106 12249
Fax: +49 521 106 12412
Mail: cimiano@cit-ec.uni-bielefeld.de

Forschungsbau Intelligente Systeme (FBIIS)
Raum 2.307
Universität Bielefeld
Inspiration 1
33619 Bielefeld
Received on Saturday, 24 May 2014 06:30:27 UTC