Re: [bpmlod] Guidelines for converting BabelNet as Linguistic Linked Data from Tiziano Flati on 2014-05-27 (public-bpmlod@w3.org from May 2014)

From: Tiziano Flati <tiziano.flati@gmail.com>
Date: Tue, 27 May 2014 16:16:06 +0200
To: Dave Lewis <dave.lewis@cs.tcd.ie>
Cc: Roberto Navigli <navigli@di.uniroma1.it>, lider <lider@delicias.dia.fi.upm.es>, "public-bpmlod@w3.org" <public-bpmlod@w3.org>
Message-ID: <CAGNQ8qZOmgkyEmhsx5myJVzkQfHgKjq9bfJhHMhWxdeFKwy+4g@mail.gmail.com>
Hi all,

@Dave:

> Have you considered already how you might allow third parties to QA and
> perhaps correct those translations? That is, some sort of process by which
> proposed MT translations between senses can be promoted to more
> authoritative, human checked translations, and marked as such?
>
No, I don't think we have something similar at the moment (not ready,
AFAIK). However, in order to have high coverage (we are talking about
millions of translations), probably an automatic validation mechanism
should be provided (for the large numbers) and maybe some user-oriented
interface (for the small numbers)? Anyway, it is a very good idea!


>
> The ITS text analytics and/or terminology data categories, which also have
> confidence scores could be useful for annotating such a process:
> http://www.w3.org/TR/its20/#textanalysis
> http://www.w3.org/TR/its20/#terminology
>
> To enable such checking and progression in the authoritativeness of senses
> in different languages, it is important that you record what senses are a
> translation of what other senses.
>
Yes, it makes a lot of sense. Currently all the translations are in a
"all-to-all" relation...


> In relation to the senses that are extracted from Wikipedia interlanguage
> links. Do you consider those 'translations', and in particular can you tell
> from those which is the source and which is the target?
>
For the Wikipedia interlanguage links: we consider those translations
because I think we have some structural refinement that reinforces the
links across the languages (so they are more likely to be true
translations). For the second question: yes, as far as I know, we do know
who is the source of who : )


@Jorge:

> You could also consider using the lemon translation module to represent
> explicit translations as linked data.

Good to know! I think we will integrate all of these, shortly!
I had a look at your paper and, indeed, the idea of reifying the
translation relation was very nice :) Even if not at LREC, I am looking
forward to see your presentation!


@Felix:

> "Issues: Information about translation confidence"

So maybe having a wiki or gdocs page about this aspect in which everybody
> can enter „his technology“ info would help?

 Yes, this makes sense. Do you mean a general-purpose gdoc for the
"translation"-issue or a gdoc associated with BabelNet's translation
representation? I have the second option ready, just waiting to share it!

Thanks to everybody for his/her comment so far : )







> Interested to hear what you think.
>
> cheers,
> Dave
>
>
>
> On 22/05/2014 17:41, Roberto Navigli wrote:
>
> Thanks Felix! To answer Dave's comment: translations come from the
> automatic translations of semantically annotated corpora, as Tiziano said,
> and we have a confidence for each of these translations together with the
> source of the original text.
>
> Best,
> Roberto
>
>
> 2014-05-22 18:35 GMT+02:00 Tiziano Flati <tiziano.flati@gmail.com>:
>
>> @Felix:
>>
>>> I am wondering if ITS 2.0 properties could help here, see
>>> https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>> There is mtConfidence which provides the confidence value for machine
>>> translation and mtConfidenceAnnotatorsRef  to identify the tool used.
>>> Also, there is provenance related properties, starting at  :org,
>>> until :revToolRef, that could identify the provenance information you need.
>>> The underlying definitions for the two ITS data categories are at
>>> http://www.w3.org/TR/its20/#provenance
>>> http://www.w3.org/TR/its20/#mtconfidence
>>
>>  Yes, I think that the ITS 2.0 can definitely be a very good point to
>> explore. At the moment I don't think we need modelling properties more
>> complex than those ones (such as mtConfidenceRule, etc.), so I think this
>> fits well our needs.
>>
>>  @Lewis:
>>
>>> Do you know currently the provenance of the translation between senses
>>> in babelNet. Have you produced any of the translations yourself, or to you
>>> just take the links where they are present in the source resources, e.g.
>>> DBpedia.
>>>  What is the policy in Babelnet, is some translation better than none,
>>> or is there a translation confidence threshold, e.g. based on human
>>> checking, Mt confidence or logical inference etc that you employ?
>>>
>> BabelNet translations can come from explicit resource information (e.g.,
>> Wikipedia interlanguage links) or as automatic translations supported by
>> millions of sense-tagged sentences coming from Wikipedia and Semcor.
>> In conclusion, AFAIK, BabelNet *does have* translation quality
>> estimation, so I think that indication about confidence could be also
>> provided. (Roberto, correct me if I am wrong)
>>
>>  Thank you all for your comments and suggestions :)
>> Tiziano
>>
>> 2014-05-22 16:07 GMT+02:00 Dave Lewis <dave.lewis@cs.tcd.ie>:
>>
>>  Hi Tiziano, Roberto,
>>> Do you know currently the provenance of the translation between senses
>>> in babelNet. Have you produced any of the translations yourself, or to you
>>> just take the links where they are present in the source resources, e.g.
>>> DBpedia.
>>>
>>> In a localization or MT application we look at in CNGL and FALCON, where
>>> we may use translation to  guide translators or help train MT engines, the
>>> provenance is important so some policies can be applied to reduce the
>>> propagation of inaccurate translation, or translation that are not
>>> appropriate to the context at hand - so those ITS attributes are really
>>> important there. To thins extend, when representing this as linked data, we
>>> define 'wasTranslatedFrom' as a property of 'prov:wasDerivedFrom' to reify
>>> other provenance meta-data -  agents, tools, context etc.
>>>
>>> What is the policy in Babelnet, is some translation better than none, or
>>> is there a translation confidence threshold, e.g. based on human checking,
>>> Mt confidence or logical inference etc that you employ?
>>>
>>> many thanks,
>>> Dave
>>>
>>>
>>> On 22/05/2014 10:42, Felix Sasaki wrote:
>>>
>>> Hi Titziano,
>>>
>>>  sorry that I could not make the call due to personal reasons.
>>>
>>>  In the draft I saw under „translation“ this issue:
>>>
>>>  „Issues: Information about translation confidence (was it humanly or
>>> automatically produced? if automatic, with what confidence score?) and
>>> translation provenance (what text(s) does the translation come from? who
>>> translated and with what tool?).
>>> Another issue concerns whether the relation lexinfo:translation is
>>> essential or not: every sense in a language within a BabelSynset is, in
>>> fact, a translation of any other sense in another language, so that this
>>> information could actually be derived (problem of redundancy). However,
>>> having data linked one to each other could also be a benefit, since
>>> the information is explicit in the resource.“
>>>
>>>  I am wondering if ITS 2.0 properties could help here, see
>>>
>>>  https://www.w3.org/International/its/wiki/ITS-RDF_mapping
>>>
>>>  There is mtConfidence which provides the confidence value for machine
>>> translation and mtConfidenceAnnotatorsRef  to identify the tool used.
>>>
>>>  Also, there is provenance related properties, starting at  :org,
>>> until :revToolRef, that could identify the provenance information you need.
>>> The underlying definitions for the two ITS data categories are at
>>> http://www.w3.org/TR/its20/#provenance
>>> http://www.w3.org/TR/its20/#mtconfidence
>>>
>>>  Best,
>>>
>>>  Felix
>>>
>>>  Am 22.05.2014 um 10:12 schrieb Tiziano Flati <tiziano.flati@gmail.com>:
>>>
>>>  Dear all,
>>>
>>>  we have compiled a first draft of guidelines for the conversion of
>>> BabelNet as Linguistic Linked Data. The initial draft is here<https://docs.google.com/document/d/184C_AjY7_PYBSc8SnAFghGLyTo1v312N34dsP9QZokI/edit#>
>>> .
>>>
>>>  We can probably integrate this into the BPMLOD community report both
>>> as a separate document and in the form of all our resource-dependent and
>>> independent details/comments.
>>> Any feedback and comment is also very appreciated and will help us
>>> improving the draft.
>>>
>>>  Best regards,
>>> Tiziano Flati and Roberto Navigli
>>>
>>>
>>>
>>>
>>
>
>
> --
> =====================================
> Roberto Navigli
> Dipartimento di Informatica
> Sapienza University of Rome
> Viale Regina Elena 295 (second floor)
> 00161 Roma Italy
> Phone: +39 0649255161 - Fax: +39 06 8541842
> Home Page: http://wwwusers.di.uniroma1.it/~navigli
> =====================================
>
>
>
Received on Tuesday, 27 May 2014 14:16:35 UTC