Re: [bpmlod] Guidelines for converting BabelNet as Linguistic Linked Data from Jorge Gracia on 2014-05-29 (public-ontolex@w3.org from May 2014)

From: Jorge Gracia <jgracia@fi.upm.es>
Date: Thu, 29 May 2014 22:26:07 +0000
To: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Cc: "public-ontolex@w3.org" <public-ontolex@w3.org>, "public-bpmlod@w3.org" <public-bpmlod@w3.org>
Message-ID: <CANzuSaNBT0JjhhxsbfdtpErNQGg-pC-NxjranXJVONg1Qk0MxQ@mail.gmail.com>
+1 to Philipp's comment. Also in my view "translation confidence" is a
characteristic of the process in which the translation was obtained, not a
property of the (target) lexical entry itself.

Regards,
Jorge


2014-05-28 13:44 GMT+00:00 Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
:

>  Hi Dave,
>
> thanks a lot for your input. Most of your comments concerns Translation as
> viewed from the perspective of a process.
>
> So far, in the ontolex group we have regarded "translation" as a special
> case of "cross-lingual variation", abstracting from the process by which
> the actual translation was produced.
>
> So the reified relation "Translation" means rather that two Lexical Senses
> stand to reach other in a relation of translation, independently of how
> this translation was obtained.
>
> We might rename "Translation" as "TranslationVariant" to make this clearer.
>
> On your example:
>
>
> ex:34678es a lemon:LexicalEntry;
>  a prov:Entity;
>  lemon:form [ lemon:writtenRep "casa"@es ];
>  ex:34678es ontolexTrans:wasTranslatedFrom ex:34678en;
>  its:mtConfidence "0.5";
>  ontolexTrans:qualifiedTranslation [
>     a ontolex:Translation;
>     prov:hadActivity ex:ExMachineTranslation;
>  ].
>
> I am not fully convinced here as this example attaches the confidence and
> other properties to the lexical entry. The confidence however should be
> attached to the relation of being a translation of each other IMHO rather
> than to the lexical entries / lexical senses.
>
> So we could certainly attach provenance information to the
> "TranslationVariant" object, but I would not add the prov. information to
> the lexical entries standing in the relation of being a translation of each
> other.
>
> In fact, the confidence is not a property of any lexical entry, it is the
> confidence in the fact that X is the (correct) translation of Y, so it
> should be attached to an object reifying this relation rather than to one
> of the lexical entries or lexical senses involved.
>
> So yes, we could recommend using the Prov-O vocabulary to make the
> provenance information of a "TranslationVariant" explicit.
>
> Does that make sense?
>
> Regards,
>
> Philipp.
>
> Am 27.05.14 03:23, schrieb Dave Lewis:
>
> Hi Jorge, guys,
> Thanks for these pointers, I had not been following this as closely as I
> should, so I have some comment below that are relevant to both the
> meta-share RDF model and your translation model in ontolex, so I've copied
> them also.
>
> You are quite correct to reify the translation relationship. Deriving an
> authoritative translation is rarely straighforward and may involve
> different inputs at different times from different sources, e.g. babelnet
> has professionally curated translation, translations from wikipedia and MT
> oututs.
>
> So in many cases you are dealing with the current status of a provisional
> translations rather than 'final' authoritative.
>
> Also, there is some potential confusion in naming the reifying class
> 'Translation' since in many situations this refers to the string in the
> targt language rather than the entity linking a target language string to a
> source language string.
>
> In [1] we proposed an approach to handle this by specilising from the W3C
> Provenance vocubulary [2].
>
> This means treating the source and targets of translation (LexicalEntry,
> LexicalSense) as prov:Entity classes so that their provenance can be
> tracked using other classes and proerties from that model.
>
> Specifically we propose specialising the provenance property:
> http://www.w3.org/TR/prov-o/#wasDerivedFrom
>
> i.e.
> ontolexTrans:wasTranslatedFrom  rdfs:subPropertyOf
>        prov:wasDerivedFrom.
>
> PROV-O also enables reification by defining a class:
> http://www.w3.org/TR/prov-o/#Derivation
>
> which is in the range of:
> http://www.w3.org/TR/prov-o/#qualifiedDerivation
>
> So similarly we can define
> ontolexTrans:Translation rdfs:subClassOf prov:Derivation.
>
> and
>
> ontolexTrans:qualifiedTranslation rdfs:subPropertyOf
>       prov:qualifiedDerivation.
>
> To flesh this out with an example:
>
> ex:34678en a lemon:LexicalEntry;
>  a prov:Entity;
>  lemon:form [ lemon:writtenRep "house"@en ] .
>
> ex:34678es a lemon:LexicalEntry;
>  a prov:Entity;
>  lemon:form [ lemon:writtenRep "casa"@es ];
>  ex:34678es ontolexTrans:wasTranslatedFrom ex:34678en;
>  its:mtConfidence "0.5";
>  ontolexTrans:qualifiedTranslation [
>     a ontolex:Translation;
>     prov:hadActivity ex:ExMachineTranslation;
>  ].
>
> Note in the above the its:mtConfidence is more accurately used to annotate
> the LexicalEntry rather than the Translation, as it is a property of the
> text resulting from the translation, rather than a reification of the
> translation.
>
> Thoughts welcome.
>
> cheers,
> Dave
>
>
>
>
>
>
>
> [1] http://www.lrec-conf.org/proceedings/lrec2012/pdf/636_Paper.pdf
> [2] http://www.w3.org/TR/prov-o/
> On 23/05/2014 14:48, Jorge Gracia wrote:
>
> Dear Tiziano, Roberto
>
>  You could also consider using the lemon translation module to represent
> explicit translations as linked data. This is currently under development
> in the ONTOLEX group but there is a lemon-based version already available,
> that I will present at LREC next week [1]. The idea is reifying the
> translation relation so you can attach additional information to it
> (source, target, confidence, provenance, etc.) [2]
>
>  Regards,
>
>  Jorge
>
>  [1]
> http://ra.cps.unizar.es:8080/PUBLICATIONS/attachedFiles/document/LREC2014_translations_V11.pdf
> [2] http://purl.org/net/translation#
>
>
>
>
> 2014-05-23 11:58 GMT+02:00 Dave Lewis <dave.lewis@cs.tcd.ie>:
>
>>  Roberto, Tiziano,
>> Thanks for that.
>>
>> Have you considered already how you might allow third parties to QA and
>> perhaps correct those translations? That is, some sort of process by which
>> proposed MT translations between senses can be promoted to more
>> authoritative, human checked translations, and marked as such?
>>
>> The ITS text analytics and/or terminology data categories, which also
>> have confidence scores could be useful for annotating such a process:
>> http://www.w3.org/TR/its20/#textanalysis
>> http://www.w3.org/TR/its20/#terminology
>>
>> To enable such checking and progression in the authoritativeness of
>> senses in different languages, it is important that you record what senses
>> are a translation of what other senses.
>>
>> In relation to the senses that are extracted from Wikipedia interlanguage
>> links. Do you consider those 'translations', and in particular can you tell
>> from those which is the source and which is the target?
>>
>> Interested to hear what you think.
>>
>> cheers,
>> Dave
>>
>>
>>
>> On 22/05/2014 17:41, Roberto Navigli wrote:
>>
>> Thanks Felix! To answer Dave's comment: translations come from the
>> automatic translations of semantically annotated corpora, as Tiziano said,
>> and we have a confidence for each of these translations together with the
>> source of the original text.
>>
>> Best,
>> Roberto
>>
>>
>> 2014-05-22 18:35 GMT+02:00 Tiziano Flati <tiziano.flati@gmail.com>:
>>
>>> @Felix:
>>>
>>>> I am wondering if ITS 2.0 properties could help here, see
>>>> https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>>> There is mtConfidence which provides the confidence value for machine
>>>> translation and mtConfidenceAnnotatorsRef  to identify the tool used.
>>>> Also, there is provenance related properties, starting at  :org,
>>>> until :revToolRef, that could identify the provenance information you need.
>>>> The underlying definitions for the two ITS data categories are at
>>>> http://www.w3.org/TR/its20/#provenance
>>>> http://www.w3.org/TR/its20/#mtconfidence
>>>
>>>  Yes, I think that the ITS 2.0 can definitely be a very good point to
>>> explore. At the moment I don't think we need modelling properties more
>>> complex than those ones (such as mtConfidenceRule, etc.), so I think this
>>> fits well our needs.
>>>
>>>  @Lewis:
>>>
>>>> Do you know currently the provenance of the translation between senses
>>>> in babelNet. Have you produced any of the translations yourself, or to you
>>>> just take the links where they are present in the source resources, e.g.
>>>> DBpedia.
>>>>  What is the policy in Babelnet, is some translation better than none,
>>>> or is there a translation confidence threshold, e.g. based on human
>>>> checking, Mt confidence or logical inference etc that you employ?
>>>>
>>> BabelNet translations can come from explicit resource information (e.g.,
>>> Wikipedia interlanguage links) or as automatic translations supported by
>>> millions of sense-tagged sentences coming from Wikipedia and Semcor.
>>> In conclusion, AFAIK, BabelNet *does have* translation quality
>>> estimation, so I think that indication about confidence could be also
>>> provided. (Roberto, correct me if I am wrong)
>>>
>>>  Thank you all for your comments and suggestions :)
>>> Tiziano
>>>
>>> 2014-05-22 16:07 GMT+02:00 Dave Lewis <dave.lewis@cs.tcd.ie>:
>>>
>>>  Hi Tiziano, Roberto,
>>>> Do you know currently the provenance of the translation between senses
>>>> in babelNet. Have you produced any of the translations yourself, or to you
>>>> just take the links where they are present in the source resources, e.g.
>>>> DBpedia.
>>>>
>>>> In a localization or MT application we look at in CNGL and FALCON,
>>>> where we may use translation to  guide translators or help train MT
>>>> engines, the provenance is important so some policies can be applied to
>>>> reduce the propagation of inaccurate translation, or translation that are
>>>> not appropriate to the context at hand - so those ITS attributes are really
>>>> important there. To thins extend, when representing this as linked data, we
>>>> define 'wasTranslatedFrom' as a property of 'prov:wasDerivedFrom' to reify
>>>> other provenance meta-data -  agents, tools, context etc.
>>>>
>>>> What is the policy in Babelnet, is some translation better than none,
>>>> or is there a translation confidence threshold, e.g. based on human
>>>> checking, Mt confidence or logical inference etc that you employ?
>>>>
>>>> many thanks,
>>>> Dave
>>>>
>>>>
>>>> On 22/05/2014 10:42, Felix Sasaki wrote:
>>>>
>>>> Hi Titziano,
>>>>
>>>>  sorry that I could not make the call due to personal reasons.
>>>>
>>>>  In the draft I saw under „translation“ this issue:
>>>>
>>>>  „Issues: Information about translation confidence (was it humanly or
>>>> automatically produced? if automatic, with what confidence score?) and
>>>> translation provenance (what text(s) does the translation come from? who
>>>> translated and with what tool?).
>>>> Another issue concerns whether the relation lexinfo:translation is
>>>> essential or not: every sense in a language within a BabelSynset is, in
>>>> fact, a translation of any other sense in another language, so that this
>>>> information could actually be derived (problem of redundancy). However,
>>>> having data linked one to each other could also be a benefit, since
>>>> the information is explicit in the resource.“
>>>>
>>>>  I am wondering if ITS 2.0 properties could help here, see
>>>>
>>>>  https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>>>
>>>>  There is mtConfidence which provides the confidence value for machine
>>>> translation and mtConfidenceAnnotatorsRef  to identify the tool used.
>>>>
>>>>  Also, there is provenance related properties, starting at  :org,
>>>> until :revToolRef, that could identify the provenance information you need.
>>>> The underlying definitions for the two ITS data categories are at
>>>> http://www.w3.org/TR/its20/#provenance
>>>> http://www.w3.org/TR/its20/#mtconfidence
>>>>
>>>>  Best,
>>>>
>>>>  Felix
>>>>
>>>>  Am 22.05.2014 um 10:12 schrieb Tiziano Flati <tiziano.flati@gmail.com
>>>> >:
>>>>
>>>>  Dear all,
>>>>
>>>>  we have compiled a first draft of guidelines for the conversion of
>>>> BabelNet as Linguistic Linked Data. The initial draft is here
>>>> <https://docs.google.com/document/d/184C_AjY7_PYBSc8SnAFghGLyTo1v312N34dsP9QZokI/edit#>
>>>> .
>>>>
>>>>  We can probably integrate this into the BPMLOD community report both
>>>> as a separate document and in the form of all our resource-dependent and
>>>> independent details/comments.
>>>> Any feedback and comment is also very appreciated and will help us
>>>> improving the draft.
>>>>
>>>>  Best regards,
>>>> Tiziano Flati and Roberto Navigli
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> =====================================
>> Roberto Navigli
>> Dipartimento di Informatica
>> Sapienza University of Rome
>> Viale Regina Elena 295 (second floor)
>> 00161 Roma Italy
>> Phone: +39 0649255161 <%2B39%200649255161> - Fax: +39 06 8541842
>> <%2B39%2006%208541842>
>> Home Page: http://wwwusers.di.uniroma1.it/~navigli
>> =====================================
>>
>>
>>
>
>
>  --
> Jorge Gracia, PhD
> Ontology Engineering Group
> Artificial Intelligence Department
> Universidad Politécnica de Madrid
> http://delicias.dia.fi.upm.es/~jgracia/
>
>
>
>
> --
>
> Prof. Dr. Philipp Cimiano
>
> Phone: +49 521 106 12249
> Fax: +49 521 106 12412
> Mail: cimiano@cit-ec.uni-bielefeld.de
>
> Forschungsbau Intelligente Systeme (FBIIS)
> Raum 2.307
> Universität Bielefeld
> Inspiration 1
> 33619 Bielefeld
>
>


-- 
Jorge Gracia, PhD
Ontology Engineering Group
Artificial Intelligence Department
Universidad Politécnica de Madrid
http://delicias.dia.fi.upm.es/~jgracia/
Received on Thursday, 29 May 2014 22:26:55 UTC