Re: [bpmlod] Guidelines for converting BabelNet as Linguistic Linked Data from Jorge Gracia on 2014-05-23 (public-bpmlod@w3.org from May 2014)

From: Jorge Gracia <jgracia@fi.upm.es>
Date: Fri, 23 May 2014 19:42:13 +0200
To: Felix Sasaki <fsasaki@w3.org>
Cc: Dave Lewis <dave.lewis@cs.tcd.ie>, Roberto Navigli <navigli@di.uniroma1.it>, Tiziano Flati <tiziano.flati@gmail.com>, lider <lider@delicias.dia.fi.upm.es>, "public-bpmlod@w3.org" <public-bpmlod@w3.org>
Message-ID: <CANzuSaMUD9aRnFqZEMoSVchfahBjR_i_VV280LBjE7-Jqx0Khg@mail.gmail.com>
Hi Felix,

Yes, I think that exploring the commonalities of both models makes a lot of
sense. Not sure if they have to be merged , but I have the feeling that our
lemon module could largely reuse ITS for some things. At the ontolex group
we will treat the variation/translation module again at some point, I
think. That would be a good opportunity to explore the role of ITS. I will
keep you updated!

Regards,
Jorge


2014-05-23 18:49 GMT+02:00 Felix Sasaki <fsasaki@w3.org>:

> Hi Jorge and all,
>
> would it make sense to ask the ontolex group and the ITS IG to merge their
> models? Otherwise there would be a confusing situation: two models for the
> same purpose.
>
> The issues are probably details. I saw e.g. in the paper that there is a
> translationConfidence OW property. It looks similar to mtConfidence in ITS,
> but there are details ideally to merge like what data type to use, whether
> to require relating confidence value to information about translation tools
> (because auto generated values cannot be interpreted without) etc.
>
> Best,
>
> Felix
>
> Am 23.05.2014 um 15:48 schrieb Jorge Gracia <jgracia@fi.upm.es>:
>
> Dear Tiziano, Roberto
>
> You could also consider using the lemon translation module to represent
> explicit translations as linked data. This is currently under development
> in the ONTOLEX group but there is a lemon-based version already available,
> that I will present at LREC next week [1]. The idea is reifying the
> translation relation so you can attach additional information to it
> (source, target, confidence, provenance, etc.) [2]
>
> Regards,
>
> Jorge
>
> [1]
> http://ra.cps.unizar.es:8080/PUBLICATIONS/attachedFiles/document/LREC2014_translations_V11.pdf
> [2] http://purl.org/net/translation#
>
>
>
>
> 2014-05-23 11:58 GMT+02:00 Dave Lewis <dave.lewis@cs.tcd.ie>:
>
>>  Roberto, Tiziano,
>> Thanks for that.
>>
>> Have you considered already how you might allow third parties to QA and
>> perhaps correct those translations? That is, some sort of process by which
>> proposed MT translations between senses can be promoted to more
>> authoritative, human checked translations, and marked as such?
>>
>> The ITS text analytics and/or terminology data categories, which also
>> have confidence scores could be useful for annotating such a process:
>> http://www.w3.org/TR/its20/#textanalysis
>> http://www.w3.org/TR/its20/#terminology
>>
>> To enable such checking and progression in the authoritativeness of
>> senses in different languages, it is important that you record what senses
>> are a translation of what other senses.
>>
>> In relation to the senses that are extracted from Wikipedia interlanguage
>> links. Do you consider those 'translations', and in particular can you tell
>> from those which is the source and which is the target?
>>
>> Interested to hear what you think.
>>
>> cheers,
>> Dave
>>
>>
>>
>> On 22/05/2014 17:41, Roberto Navigli wrote:
>>
>> Thanks Felix! To answer Dave's comment: translations come from the
>> automatic translations of semantically annotated corpora, as Tiziano said,
>> and we have a confidence for each of these translations together with the
>> source of the original text.
>>
>> Best,
>> Roberto
>>
>>
>> 2014-05-22 18:35 GMT+02:00 Tiziano Flati <tiziano.flati@gmail.com>:
>>
>>> @Felix:
>>>
>>>> I am wondering if ITS 2.0 properties could help here, see
>>>> https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>>> There is mtConfidence which provides the confidence value for machine
>>>> translation and mtConfidenceAnnotatorsRef  to identify the tool used.
>>>> Also, there is provenance related properties, starting at  :org,
>>>> until :revToolRef, that could identify the provenance information you need.
>>>> The underlying definitions for the two ITS data categories are at
>>>> http://www.w3.org/TR/its20/#provenance
>>>> http://www.w3.org/TR/its20/#mtconfidence
>>>
>>>  Yes, I think that the ITS 2.0 can definitely be a very good point to
>>> explore. At the moment I don't think we need modelling properties more
>>> complex than those ones (such as mtConfidenceRule, etc.), so I think this
>>> fits well our needs.
>>>
>>>  @Lewis:
>>>
>>>> Do you know currently the provenance of the translation between senses
>>>> in babelNet. Have you produced any of the translations yourself, or to you
>>>> just take the links where they are present in the source resources, e.g.
>>>> DBpedia.
>>>>  What is the policy in Babelnet, is some translation better than none,
>>>> or is there a translation confidence threshold, e.g. based on human
>>>> checking, Mt confidence or logical inference etc that you employ?
>>>>
>>> BabelNet translations can come from explicit resource information (e.g.,
>>> Wikipedia interlanguage links) or as automatic translations supported by
>>> millions of sense-tagged sentences coming from Wikipedia and Semcor.
>>> In conclusion, AFAIK, BabelNet *does have* translation quality
>>> estimation, so I think that indication about confidence could be also
>>> provided. (Roberto, correct me if I am wrong)
>>>
>>>  Thank you all for your comments and suggestions :)
>>> Tiziano
>>>
>>> 2014-05-22 16:07 GMT+02:00 Dave Lewis <dave.lewis@cs.tcd.ie>:
>>>
>>>  Hi Tiziano, Roberto,
>>>> Do you know currently the provenance of the translation between senses
>>>> in babelNet. Have you produced any of the translations yourself, or to you
>>>> just take the links where they are present in the source resources, e.g.
>>>> DBpedia.
>>>>
>>>> In a localization or MT application we look at in CNGL and FALCON,
>>>> where we may use translation to  guide translators or help train MT
>>>> engines, the provenance is important so some policies can be applied to
>>>> reduce the propagation of inaccurate translation, or translation that are
>>>> not appropriate to the context at hand - so those ITS attributes are really
>>>> important there. To thins extend, when representing this as linked data, we
>>>> define 'wasTranslatedFrom' as a property of 'prov:wasDerivedFrom' to reify
>>>> other provenance meta-data -  agents, tools, context etc.
>>>>
>>>> What is the policy in Babelnet, is some translation better than none,
>>>> or is there a translation confidence threshold, e.g. based on human
>>>> checking, Mt confidence or logical inference etc that you employ?
>>>>
>>>> many thanks,
>>>> Dave
>>>>
>>>>
>>>> On 22/05/2014 10:42, Felix Sasaki wrote:
>>>>
>>>> Hi Titziano,
>>>>
>>>>  sorry that I could not make the call due to personal reasons.
>>>>
>>>>  In the draft I saw under „translation“ this issue:
>>>>
>>>>  „Issues: Information about translation confidence (was it humanly or
>>>> automatically produced? if automatic, with what confidence score?) and
>>>> translation provenance (what text(s) does the translation come from? who
>>>> translated and with what tool?).
>>>> Another issue concerns whether the relation lexinfo:translation is
>>>> essential or not: every sense in a language within a BabelSynset is, in
>>>> fact, a translation of any other sense in another language, so that this
>>>> information could actually be derived (problem of redundancy). However,
>>>> having data linked one to each other could also be a benefit, since
>>>> the information is explicit in the resource.“
>>>>
>>>>  I am wondering if ITS 2.0 properties could help here, see
>>>>
>>>>  https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>>>
>>>>  There is mtConfidence which provides the confidence value for machine
>>>> translation and mtConfidenceAnnotatorsRef  to identify the tool used.
>>>>
>>>>  Also, there is provenance related properties, starting at  :org,
>>>> until :revToolRef, that could identify the provenance information you need.
>>>> The underlying definitions for the two ITS data categories are at
>>>> http://www.w3.org/TR/its20/#provenance
>>>> http://www.w3.org/TR/its20/#mtconfidence
>>>>
>>>>  Best,
>>>>
>>>>  Felix
>>>>
>>>>  Am 22.05.2014 um 10:12 schrieb Tiziano Flati <tiziano.flati@gmail.com
>>>> >:
>>>>
>>>>  Dear all,
>>>>
>>>>  we have compiled a first draft of guidelines for the conversion of
>>>> BabelNet as Linguistic Linked Data. The initial draft is here<https://docs.google.com/document/d/184C_AjY7_PYBSc8SnAFghGLyTo1v312N34dsP9QZokI/edit#>
>>>> .
>>>>
>>>>  We can probably integrate this into the BPMLOD community report both
>>>> as a separate document and in the form of all our resource-dependent and
>>>> independent details/comments.
>>>> Any feedback and comment is also very appreciated and will help us
>>>> improving the draft.
>>>>
>>>>  Best regards,
>>>> Tiziano Flati and Roberto Navigli
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> =====================================
>> Roberto Navigli
>> Dipartimento di Informatica
>> Sapienza University of Rome
>> Viale Regina Elena 295 (second floor)
>> 00161 Roma Italy
>> Phone: +39 0649255161 - Fax: +39 06 8541842
>> Home Page: http://wwwusers.di.uniroma1.it/~navigli
>> =====================================
>>
>>
>>
>
>
> --
> Jorge Gracia, PhD
> Ontology Engineering Group
> Artificial Intelligence Department
> Universidad Politécnica de Madrid
> http://delicias.dia.fi.upm.es/~jgracia/
>
>
>


-- 
Jorge Gracia, PhD
Ontology Engineering Group
Artificial Intelligence Department
Universidad Politécnica de Madrid
http://delicias.dia.fi.upm.es/~jgracia/
Received on Friday, 23 May 2014 17:43:01 UTC