- From: Jorge Gracia <jgracia@fi.upm.es>
- Date: Thu, 29 May 2014 22:26:07 +0000
- To: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
- Cc: "public-ontolex@w3.org" <public-ontolex@w3.org>, "public-bpmlod@w3.org" <public-bpmlod@w3.org>
- Message-ID: <CANzuSaNBT0JjhhxsbfdtpErNQGg-pC-NxjranXJVONg1Qk0MxQ@mail.gmail.com>
+1 to Philipp's comment. Also in my view "translation confidence" is a characteristic of the process in which the translation was obtained, not a property of the (target) lexical entry itself. Regards, Jorge 2014-05-28 13:44 GMT+00:00 Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de> : > Hi Dave, > > thanks a lot for your input. Most of your comments concerns Translation as > viewed from the perspective of a process. > > So far, in the ontolex group we have regarded "translation" as a special > case of "cross-lingual variation", abstracting from the process by which > the actual translation was produced. > > So the reified relation "Translation" means rather that two Lexical Senses > stand to reach other in a relation of translation, independently of how > this translation was obtained. > > We might rename "Translation" as "TranslationVariant" to make this clearer. > > On your example: > > > ex:34678es a lemon:LexicalEntry; > a prov:Entity; > lemon:form [ lemon:writtenRep "casa"@es ]; > ex:34678es ontolexTrans:wasTranslatedFrom ex:34678en; > its:mtConfidence "0.5"; > ontolexTrans:qualifiedTranslation [ > a ontolex:Translation; > prov:hadActivity ex:ExMachineTranslation; > ]. > > I am not fully convinced here as this example attaches the confidence and > other properties to the lexical entry. The confidence however should be > attached to the relation of being a translation of each other IMHO rather > than to the lexical entries / lexical senses. > > So we could certainly attach provenance information to the > "TranslationVariant" object, but I would not add the prov. information to > the lexical entries standing in the relation of being a translation of each > other. > > In fact, the confidence is not a property of any lexical entry, it is the > confidence in the fact that X is the (correct) translation of Y, so it > should be attached to an object reifying this relation rather than to one > of the lexical entries or lexical senses involved. > > So yes, we could recommend using the Prov-O vocabulary to make the > provenance information of a "TranslationVariant" explicit. > > Does that make sense? > > Regards, > > Philipp. > > Am 27.05.14 03:23, schrieb Dave Lewis: > > Hi Jorge, guys, > Thanks for these pointers, I had not been following this as closely as I > should, so I have some comment below that are relevant to both the > meta-share RDF model and your translation model in ontolex, so I've copied > them also. > > You are quite correct to reify the translation relationship. Deriving an > authoritative translation is rarely straighforward and may involve > different inputs at different times from different sources, e.g. babelnet > has professionally curated translation, translations from wikipedia and MT > oututs. > > So in many cases you are dealing with the current status of a provisional > translations rather than 'final' authoritative. > > Also, there is some potential confusion in naming the reifying class > 'Translation' since in many situations this refers to the string in the > targt language rather than the entity linking a target language string to a > source language string. > > In [1] we proposed an approach to handle this by specilising from the W3C > Provenance vocubulary [2]. > > This means treating the source and targets of translation (LexicalEntry, > LexicalSense) as prov:Entity classes so that their provenance can be > tracked using other classes and proerties from that model. > > Specifically we propose specialising the provenance property: > http://www.w3.org/TR/prov-o/#wasDerivedFrom > > i.e. > ontolexTrans:wasTranslatedFrom rdfs:subPropertyOf > prov:wasDerivedFrom. > > PROV-O also enables reification by defining a class: > http://www.w3.org/TR/prov-o/#Derivation > > which is in the range of: > http://www.w3.org/TR/prov-o/#qualifiedDerivation > > So similarly we can define > ontolexTrans:Translation rdfs:subClassOf prov:Derivation. > > and > > ontolexTrans:qualifiedTranslation rdfs:subPropertyOf > prov:qualifiedDerivation. > > To flesh this out with an example: > > ex:34678en a lemon:LexicalEntry; > a prov:Entity; > lemon:form [ lemon:writtenRep "house"@en ] . > > ex:34678es a lemon:LexicalEntry; > a prov:Entity; > lemon:form [ lemon:writtenRep "casa"@es ]; > ex:34678es ontolexTrans:wasTranslatedFrom ex:34678en; > its:mtConfidence "0.5"; > ontolexTrans:qualifiedTranslation [ > a ontolex:Translation; > prov:hadActivity ex:ExMachineTranslation; > ]. > > Note in the above the its:mtConfidence is more accurately used to annotate > the LexicalEntry rather than the Translation, as it is a property of the > text resulting from the translation, rather than a reification of the > translation. > > Thoughts welcome. > > cheers, > Dave > > > > > > > > [1] http://www.lrec-conf.org/proceedings/lrec2012/pdf/636_Paper.pdf > [2] http://www.w3.org/TR/prov-o/ > On 23/05/2014 14:48, Jorge Gracia wrote: > > Dear Tiziano, Roberto > > You could also consider using the lemon translation module to represent > explicit translations as linked data. This is currently under development > in the ONTOLEX group but there is a lemon-based version already available, > that I will present at LREC next week [1]. The idea is reifying the > translation relation so you can attach additional information to it > (source, target, confidence, provenance, etc.) [2] > > Regards, > > Jorge > > [1] > http://ra.cps.unizar.es:8080/PUBLICATIONS/attachedFiles/document/LREC2014_translations_V11.pdf > [2] http://purl.org/net/translation# > > > > > 2014-05-23 11:58 GMT+02:00 Dave Lewis <dave.lewis@cs.tcd.ie>: > >> Roberto, Tiziano, >> Thanks for that. >> >> Have you considered already how you might allow third parties to QA and >> perhaps correct those translations? That is, some sort of process by which >> proposed MT translations between senses can be promoted to more >> authoritative, human checked translations, and marked as such? >> >> The ITS text analytics and/or terminology data categories, which also >> have confidence scores could be useful for annotating such a process: >> http://www.w3.org/TR/its20/#textanalysis >> http://www.w3.org/TR/its20/#terminology >> >> To enable such checking and progression in the authoritativeness of >> senses in different languages, it is important that you record what senses >> are a translation of what other senses. >> >> In relation to the senses that are extracted from Wikipedia interlanguage >> links. Do you consider those 'translations', and in particular can you tell >> from those which is the source and which is the target? >> >> Interested to hear what you think. >> >> cheers, >> Dave >> >> >> >> On 22/05/2014 17:41, Roberto Navigli wrote: >> >> Thanks Felix! To answer Dave's comment: translations come from the >> automatic translations of semantically annotated corpora, as Tiziano said, >> and we have a confidence for each of these translations together with the >> source of the original text. >> >> Best, >> Roberto >> >> >> 2014-05-22 18:35 GMT+02:00 Tiziano Flati <tiziano.flati@gmail.com>: >> >>> @Felix: >>> >>>> I am wondering if ITS 2.0 properties could help here, see >>>> https://www.w3.org/International/its/wiki/ITS-RDF_mapping >>>> There is mtConfidence which provides the confidence value for machine >>>> translation and mtConfidenceAnnotatorsRef to identify the tool used. >>>> Also, there is provenance related properties, starting at :org, >>>> until :revToolRef, that could identify the provenance information you need. >>>> The underlying definitions for the two ITS data categories are at >>>> http://www.w3.org/TR/its20/#provenance >>>> http://www.w3.org/TR/its20/#mtconfidence >>> >>> Yes, I think that the ITS 2.0 can definitely be a very good point to >>> explore. At the moment I don't think we need modelling properties more >>> complex than those ones (such as mtConfidenceRule, etc.), so I think this >>> fits well our needs. >>> >>> @Lewis: >>> >>>> Do you know currently the provenance of the translation between senses >>>> in babelNet. Have you produced any of the translations yourself, or to you >>>> just take the links where they are present in the source resources, e.g. >>>> DBpedia. >>>> What is the policy in Babelnet, is some translation better than none, >>>> or is there a translation confidence threshold, e.g. based on human >>>> checking, Mt confidence or logical inference etc that you employ? >>>> >>> BabelNet translations can come from explicit resource information (e.g., >>> Wikipedia interlanguage links) or as automatic translations supported by >>> millions of sense-tagged sentences coming from Wikipedia and Semcor. >>> In conclusion, AFAIK, BabelNet *does have* translation quality >>> estimation, so I think that indication about confidence could be also >>> provided. (Roberto, correct me if I am wrong) >>> >>> Thank you all for your comments and suggestions :) >>> Tiziano >>> >>> 2014-05-22 16:07 GMT+02:00 Dave Lewis <dave.lewis@cs.tcd.ie>: >>> >>> Hi Tiziano, Roberto, >>>> Do you know currently the provenance of the translation between senses >>>> in babelNet. Have you produced any of the translations yourself, or to you >>>> just take the links where they are present in the source resources, e.g. >>>> DBpedia. >>>> >>>> In a localization or MT application we look at in CNGL and FALCON, >>>> where we may use translation to guide translators or help train MT >>>> engines, the provenance is important so some policies can be applied to >>>> reduce the propagation of inaccurate translation, or translation that are >>>> not appropriate to the context at hand - so those ITS attributes are really >>>> important there. To thins extend, when representing this as linked data, we >>>> define 'wasTranslatedFrom' as a property of 'prov:wasDerivedFrom' to reify >>>> other provenance meta-data - agents, tools, context etc. >>>> >>>> What is the policy in Babelnet, is some translation better than none, >>>> or is there a translation confidence threshold, e.g. based on human >>>> checking, Mt confidence or logical inference etc that you employ? >>>> >>>> many thanks, >>>> Dave >>>> >>>> >>>> On 22/05/2014 10:42, Felix Sasaki wrote: >>>> >>>> Hi Titziano, >>>> >>>> sorry that I could not make the call due to personal reasons. >>>> >>>> In the draft I saw under „translation“ this issue: >>>> >>>> „Issues: Information about translation confidence (was it humanly or >>>> automatically produced? if automatic, with what confidence score?) and >>>> translation provenance (what text(s) does the translation come from? who >>>> translated and with what tool?). >>>> Another issue concerns whether the relation lexinfo:translation is >>>> essential or not: every sense in a language within a BabelSynset is, in >>>> fact, a translation of any other sense in another language, so that this >>>> information could actually be derived (problem of redundancy). However, >>>> having data linked one to each other could also be a benefit, since >>>> the information is explicit in the resource.“ >>>> >>>> I am wondering if ITS 2.0 properties could help here, see >>>> >>>> https://www.w3.org/International/its/wiki/ITS-RDF_mapping >>>> >>>> There is mtConfidence which provides the confidence value for machine >>>> translation and mtConfidenceAnnotatorsRef to identify the tool used. >>>> >>>> Also, there is provenance related properties, starting at :org, >>>> until :revToolRef, that could identify the provenance information you need. >>>> The underlying definitions for the two ITS data categories are at >>>> http://www.w3.org/TR/its20/#provenance >>>> http://www.w3.org/TR/its20/#mtconfidence >>>> >>>> Best, >>>> >>>> Felix >>>> >>>> Am 22.05.2014 um 10:12 schrieb Tiziano Flati <tiziano.flati@gmail.com >>>> >: >>>> >>>> Dear all, >>>> >>>> we have compiled a first draft of guidelines for the conversion of >>>> BabelNet as Linguistic Linked Data. The initial draft is here >>>> <https://docs.google.com/document/d/184C_AjY7_PYBSc8SnAFghGLyTo1v312N34dsP9QZokI/edit#> >>>> . >>>> >>>> We can probably integrate this into the BPMLOD community report both >>>> as a separate document and in the form of all our resource-dependent and >>>> independent details/comments. >>>> Any feedback and comment is also very appreciated and will help us >>>> improving the draft. >>>> >>>> Best regards, >>>> Tiziano Flati and Roberto Navigli >>>> >>>> >>>> >>>> >>> >> >> >> -- >> ===================================== >> Roberto Navigli >> Dipartimento di Informatica >> Sapienza University of Rome >> Viale Regina Elena 295 (second floor) >> 00161 Roma Italy >> Phone: +39 0649255161 <%2B39%200649255161> - Fax: +39 06 8541842 >> <%2B39%2006%208541842> >> Home Page: http://wwwusers.di.uniroma1.it/~navigli >> ===================================== >> >> >> > > > -- > Jorge Gracia, PhD > Ontology Engineering Group > Artificial Intelligence Department > Universidad Politécnica de Madrid > http://delicias.dia.fi.upm.es/~jgracia/ > > > > > -- > > Prof. Dr. Philipp Cimiano > > Phone: +49 521 106 12249 > Fax: +49 521 106 12412 > Mail: cimiano@cit-ec.uni-bielefeld.de > > Forschungsbau Intelligente Systeme (FBIIS) > Raum 2.307 > Universität Bielefeld > Inspiration 1 > 33619 Bielefeld > > -- Jorge Gracia, PhD Ontology Engineering Group Artificial Intelligence Department Universidad Politécnica de Madrid http://delicias.dia.fi.upm.es/~jgracia/
Received on Thursday, 29 May 2014 22:26:56 UTC