- From: Shaun McCance <shaunm@gnome.org>
- Date: Fri, 11 May 2012 15:50:56 -0400
- To: Yves Savourel <ysavourel@enlaso.com>
- Cc: 'MultilingualWeb-LT Working Group' <public-multilingualweb-lt@w3.org>
On Fri, 2012-05-11 at 05:15 -0600, Yves Savourel wrote: > Hi, > > To follow up on the discussion about the two cases for targetPointer, > here is some proposed text for the wiki. Thanks Yves. As promised, here's my experiences with multilingual XML formats. It's not uncommon to find formats that provide strings in multiple formats by just repeating an element and putting some sort of language-identifying attribute on each element. One common place you see this is in RDF/XML files: http://www.w3.org/TR/REC-rdf-syntax/#section-Syntax-languages (I realize RDF/XML can't generally be processed using XML-only tools, but I don't think it's unreasonable for a content creator who controls the source and can make sure it's in a canonical form.) The Best Practices for XML Internationalization explicitly recommends not doing this: http://www.w3.org/TR/xml-i18n-bp/#DevMLDoc I agree with all the reasons, but disagree with the conclusion. As far as I'm concerned, the only sane way to deal with these kinds of formats is to write the source XML as if it were monolingual, use per-language XLIFF or PO files to manage translations, and automatically merge the translations into the published file. I really only work on the extraction and merging tools, so that's the only thing I can comment on. Extracting the source strings is simple. Joining translations into a single file isn't. I have code in itstool that does this, but it has limitations in what kinds of formats it can deal with. For reference, here is a source and a generated file that's part of my regression tests: http://gitorious.org/itstool/itstool/blobs/master/tests/IT-join-1.xml http://gitorious.org/itstool/itstool/blobs/master/tests/IT-join-1.ll.xml In itstool, an element may be a translation unit. A translation unit generates a message in a PO file. An element is a unit if it is not within text, and if it contains anything other than other other units and whitespace-only text nodes. To join translations into a multilingual file, you have to know which elements to repeat. In itstool, this is any outermost translation unit. This basically works for the kinds of documents I've had to deal with. This makes two big assumptions. First, that the repeatable elements are the ones I described here. Consider this example: <application> <license xml:lang="en"> <p>This is the license.</p> <p>It has multiple paragraphs.</p> </license> <license xml:lang="de"> <p>Dies ist die Lizenz.</p> <p>Es verfügt über mehrere Absätze.</p> </license> </application> Unless I treat license as a unit (and thus, paragraphs as within text), this doesn't work. Second, itstool just assumes that the way to create a language-specific element is by using xml:lang. If the attribute is supposed to be lang or language or somens:locale_identifier, it won't work. The targetPointer only addresses identifying multiple-language versions of an element, given an existing multilingual document. This is about being able to construct such documents given translations in a normal translation format. -- Shaun
Received on Friday, 11 May 2012 19:51:23 UTC