Re: Workflows for localizing RDF (Fwd: Fwd: "Organization Ontology" Japanese translation available) from Elena Montiel Ponsoda on 2014-02-07 (public-bpmlod@w3.org from February 2014)

From: Elena Montiel Ponsoda <elemontiel@gmail.com>
Date: Fri, 07 Feb 2014 17:01:22 +0100
To: Felix Sasaki <fsasaki@w3.org>, "John P. McCrae" <jmccrae@cit-ec.uni-bielefeld.de>
CC: Dave Lewis <dave.lewis@cs.tcd.ie>, public-bpmlod@w3.org
Message-ID: <52F50352.3000902@gmail.com>
Dear Felix, all,

As John points out, in Monnet (and previously in the LabelTranslator 
work) we proposed a similar workflow, though I am not so sure about 4)

1) create linked data in one language
2) extract to XLIFF
3) translate
4) merge back into 1)

As I understand it, when localizing ontologies, you need to extract the 
natural language descriptions in the ontology (maybe they are only in 
the schema, or also explicitly stated as rdfs:labels, skos:preflabels, 
lemon:LexicalEntry, etc.), but also consider the "ontological context" 
or a sub-graph of the ontology in which a certain label is immersed, in 
order to provide the most appropriate translation. As has been said, 
sometimes labels are very short and you do not have the necessary 
context (for sense disambiguation, for example).
This is what we suggested in the LabelTranslator work, see [1].

Once you have obtained translations for all NL descriptions in the 
ontology, I would say you include them in the same format as the 
original labels were, i.e., as rdfs:labels, skos:prefLabels with the 
corresponding language tag (@es, @fr, ...).

However, if the amount of linguistic descriptions you want to relate to 
the ontology are considerable, then maybe the best option is to extract 
the original linguistic descriptions and import them into an external 
lexicon (according to the lemon model, or, soon, the model proposed in 
the Ontolex W3C Community Working Group[2]). The translation for those 
descriptions would then be stored in a different lexicon in the target 
language (let us say Japanese), and would be pointing to the same 
ontological references (so you would end up with one lexicon per language).
But, here again, and despite of having the linguistic descriptions in a 
lexicon independent from the ontology, you should consider the 
"ontological context", as it is giving you the appropriate semantics of 
those linguistic descriptions.

Not sure if this answers your questions.

Best,
Elena.


[1] http://oa.upm.es/6251/1/LabelTranslator_Tool__Auto_Loc_Onto.pdf
[2] http://www.w3.org/community/ontolex/

El 07/02/2014 14:01, Felix Sasaki escribió:
> Am 07.02.14 13:58, schrieb John P. McCrae:
>> Hi,
>>
>>
>> On Fri, Feb 7, 2014 at 12:39 PM, Felix Sasaki <fsasaki@w3.org 
>> <mailto:fsasaki@w3.org>> wrote:
>>
>>     Am 07.02.14 12:30, schrieb Dave Lewis:
>>>     Felix,
>>>
>>>     On 07/02/2014 10:43, Felix Sasaki wrote:
>>>>     Hi all,
>>>>
>>>>     sorry that I could not make today's call. I am wondering if
>>>>     below mail, taken from
>>>>     http://lists.w3.org/Archives/Public/w3c-translators/2014JanMar/0024.html
>>>>     could lead to two best practices:
>>>>
>>>>     1) When you prepare RDF content for translation (ontologies and
>>>>     or pure statements), consider extracting the text to be
>>>>     translated. That will assure that all translators do the same.
>>>
>>>     So would this need some extraction and remerging rules for RDF
>>>     in XML and turtle?
>>
>>     The input format could be RDF in XML, Turtle, or something else.
>>     Like you can generate XLIFF out of java, javascript, HTML etc.
>>
>>
>>>     And should we specify this generically or perhaps directly into
>>>     XLIFF?
>>
>>     I was mostly wondering about recommending a workflow
>>     1) create linked data in one language
>>     2) extract to XLIFF
>>     3) translate
>>     4) merge back into 1)
>>     which may makes sense for any serialization of RDF. The ITS2
>>     metadata that I had used in that slides uses the metadata in an
>>     RDF 1.1. HTML literal. That data type can be used in RDF 1.1.
>>     independent of the RDF serialization - it works like an XML literal.
>>
>>
>> So this is more-or-less what we did as a baseline in Monnet (although 
>> plain text instead of XLIFF). The key issue with this approach is 
>> that you lose the context of the ontology when you are translating, 
>> which can be a problem when you have very short labels for your 
>> concepts that are highly ambiguous. I am not sure how much of this 
>> context can be captured with XLIFF.
>
> See the example file I have sent around. It has a skeleton file that 
> provides at least some context.
>
> Best,
>
> Felix
>
>>
>>
>>>
>>>     Also, in general  should we treat translation of RDF
>>>     type/class/relationship names differently from translation of
>>>     literals? 
>>
>>     Actually I was just thinking of literals, nothing else. So the BP
>>     I had in mind is related to literals. Good point, one has to make
>>     clear that this is not about type / class etc. localization.
>>
>> +1: BP is only translate literals
>>
>>
>>
>>>     The MONNET guys might a good handle on this.
>>>
>>>     Is there also best practice we should consider or reference for
>>>     non text data types (xsd).
>>>
>>>>
>>>>     2) Consider adding metadata to the RDF content to guide that
>>>>     extraction, e.g. to identify fixed terms. An example how that
>>>>     could work is on slide 31-32 of
>>>>     http://download.yandex.ru/company/experience/WSD/wsd_sasaki.pdf
>>>>
>>>
>>>     that makes sense - but do we need to have a special literal type
>>>     to indicate that it should be parsed for 'inline' tags? 
>>
>>     See above - the HTML literal
>>     http://www.w3.org/TR/rdf11-concepts/#section-html
>>     should do the job.
>>
>>
>>
>>>     Also in some cases, for example if the span had
>>>     its-term--into-ref pointing to a term definitions elsewhere in
>>>     the linked data cloud, best practice might be to reform (i.e.
>>>     extract) the literal into a NIF subgraph, with the annotated
>>>     sub-string as separate nif:string objects.
>>
>>     Not sure if for generating an XLIFF file (see above) you would a
>>     NIF subgraph. The main motivation for my BP proposal was: allow
>>     people working with localization tools (= processing XLIFF files)
>>     to translate labels in linke data.
>>
>>     So all the below makes sense IMO for textual content, extracted
>>     from HTML / XML etc. But processing the labels in linked data
>>     with NIF? Not sure if that is needed and might even hinder XLIFF
>>     based using localization workflows.
>>
>>     Disclaimer: really nothing against NIF ;) My point is only about
>>     the right approach for label translation.
>>
>>     Best,
>>
>>     Felix
>>
>>
>>>
>>>     A common re-merge process would also then be needed so the
>>>     translated literal is available without inline mark-up for
>>>     processes (idenxing, presentation) that don't care about the
>>>     translation process.
>>>
>>>     The ITS<->NIF mapping in the ITS 2.0 spec would provide a
>>>     starting poitn for this:
>>>     http://www.w3.org/TR/2013/REC-its20-20131029/#conversion-to-nif
>>>
>>>     i'd also add:
>>>
>>>     3) can we advise on use of some form of isTranslationOf or
>>>     isTranslatedFrom (not necessarily the same?) RDF relationship to
>>>     use in linked data? In CNGL we use something that is a
>>>     specialisation of prov:wasDerivedFrom, but that's because we are
>>>     interested recording the details of the translation processes
>>>     (and hence the other provenance classes and relationships).  I
>>>     could imagine there are use cases where we are interested in a
>>>     'translated from' link but not the provenance?
>>>
>>>     cheers,
>>>     Dave
>>>
>>>
>>>
>>>>     Thoughts?
>>>>
>>>>     - Felix
>>>>
>>>>
>>>>     -------- Original-Nachricht --------
>>>>     Betreff:  Fwd: "Organization Ontology" Japanese translation
>>>>     available
>>>>     Weitersenden-Datum:  Wed, 05 Feb 2014 21:47:43 +0000
>>>>     Weitersenden-Von:  w3c-translators@w3.org
>>>>     <mailto:w3c-translators@w3.org>
>>>>     Datum:  Wed, 05 Feb 2014 21:46:48 +0000
>>>>     Von:  Phil Archer <phila@w3.org> <mailto:phila@w3.org>
>>>>     An:  Shuji Kamitsuna <ax2s-kmtn@asahi-net.or.jp>
>>>>     <mailto:ax2s-kmtn@asahi-net.or.jp>
>>>>     Kopie (CC):  w3c-translators@w3.org
>>>>     <mailto:w3c-translators@w3.org>, Naomi Yoshizawa <naomi@w3.org>
>>>>     <mailto:naomi@w3.org>
>>>>
>>>>
>>>>
>>>>     Hi again Shuji,
>>>>
>>>>     I've been through your translation of ORG and... this is very
>>>>     interesting. The person behind ORG is not the same as the people behind
>>>>     DCAT and the styles are quite different. One way in which this becomes
>>>>     obvious is that Dave Reynolds (ORG) does not give the labels for his
>>>>     terms in the specification, but only in the schema. Therefore, very
>>>>     reasonably, you have not translated the labels. When I come to transfer
>>>>     your work in the schema, I can only copy the comments.
>>>>
>>>>     And, I even found a whole class in the schema that's not in the spec!
>>>>
>>>>     Ah well, I have copied the comments into the schema as you can now see
>>>>     athttp://www.w3.org/ns/org.ttl. The labels are available in the other
>>>>     languages for Org (FR and IT) but that's because we were supplied with
>>>>     translations of the schema, not the spec - which is the much bigger task
>>>>     that you have taken on.
>>>>
>>>>     If you or Naomi wants to send me the Japanese labels, I'll certainly add
>>>>     them, but the definitions are all in the schema now.
>>>>
>>>>     Again, thank you for all your work on this.
>>>>
>>>>     Phil.
>>>>
>>>>     >> ------- Forwarded message -------
>>>>     >> From: "Shuji Kamitsuna"<ax2s-kmtn@asahi-net.or.jp>  <mailto:ax2s-kmtn@asahi-net.or.jp>
>>>>     >> To:w3c-translators@w3.org  <mailto:w3c-translators@w3.org>
>>>>     >> Subject: "Organization Ontology" Japanese translation available
>>>>     >> Date: Sat, 01 Feb 2014 12:14:58 +0100
>>>>     >>
>>>>     >> Dear Sir and Madam
>>>>     >>
>>>>     >> This is Shuji Kamitsuna@Japan.
>>>>     >>
>>>>     >> "Organization Ontology"
>>>>     >>http://www.w3.org/TR/2014/REC-vocab-org-20140116/
>>>>     >>
>>>>     >> in Japanese is available now"
>>>>     >>
>>>>     >> 組織オントロジー
>>>>     >>http://www.asahi-net.or.jp/~ax2s-kmtn/internet/rdf/REC-vocab-org-20140116.html  <http://www.asahi-net.or.jp/%7Eax2s-kmtn/internet/rdf/REC-vocab-org-20140116.html>
>>>>     >>
>>>>     >> cf.<http://www.w3.org/2005/11/Translations/Query?rec=vocab-org&lang=any&translator=any&date=any&sorting=byTechnology&output=FullHTML&submit=Submit>  <http://www.w3.org/2005/11/Translations/Query?rec=vocab-org&lang=any&translator=any&date=any&sorting=byTechnology&output=FullHTML&submit=Submit>
>>>>     >>
>>>>     >> Regards,
>>>>     >>
>>>>     >>
>>>>     >> --
>>>>     >> Coralie Mercier  -  W3C Communications Team  -http://www.w3.org
>>>>     >>mailto:coralie@w3.org  +336 4322 0001  <tel:%2B336%204322%200001>  http://www.w3.org/People/CMercier/
>>>>     >
>>>>     >
>>>>     > ----
>>>>     > Ivan Herman, W3C
>>>>     > Digital Publishing Activity Lead
>>>>     > Home:http://www.w3.org/People/Ivan/
>>>>     > mobile:+31-641044153  <tel:%2B31-641044153>
>>>>     > GPG: 0x343F1A3D
>>>>     > FOAF:http://www.ivan-herman.net/foaf
>>>>     >
>>>>     >
>>>>     >
>>>>     >
>>>>     >
>>>>
>>>>     -- 
>>>>
>>>>
>>>>     Phil Archer
>>>>     W3C Data Activity Lead
>>>>     http://www.w3.org/2013/data/
>>>>
>>>>     http://philarcher.org
>>>>     +44 (0)7887 767755  <tel:%2B44%20%280%297887%20767755>
>>>>     @philarcher1
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Elena Montiel-Ponsoda
Ontology Engineering Group (OEG)
Departamento de Inteligencia Artificial
Escuela Técnica Superior de Ingenieros Informáticos
Campus de Montegancedo s/n
Boadilla del Monte-28660 Madrid, España
www.oeg-upm.net
Tel. (+34) 91 336 36 70
Fax  (+34) 91 352 48 19
Received on Friday, 7 February 2014 16:01:51 UTC