Re: [ISSUE-15] target-pointer from Shaun McCance on 2012-05-11 (public-multilingualweb-lt@w3.org from May 2012)

From: Shaun McCance <shaunm@gnome.org>
Date: Fri, 11 May 2012 15:50:56 -0400
To: Yves Savourel <ysavourel@enlaso.com>
Cc: 'MultilingualWeb-LT Working Group' <public-multilingualweb-lt@w3.org>
Message-ID: <1336765856.20969.243.camel@recto>

On Fri, 2012-05-11 at 05:15 -0600, Yves Savourel wrote:
> Hi,
> 
> To follow up on the discussion about the two cases for targetPointer,
> here is some proposed text for the wiki.

Thanks Yves. As promised, here's my experiences with multilingual XML
formats. It's not uncommon to find formats that provide strings in
multiple formats by just repeating an element and putting some sort
of language-identifying attribute on each element.

One common place you see this is in RDF/XML files:

http://www.w3.org/TR/REC-rdf-syntax/#section-Syntax-languages

(I realize RDF/XML can't generally be processed using XML-only tools,
but I don't think it's unreasonable for a content creator who controls
the source and can make sure it's in a canonical form.)

The Best Practices for XML Internationalization explicitly recommends
not doing this:

http://www.w3.org/TR/xml-i18n-bp/#DevMLDoc

I agree with all the reasons, but disagree with the conclusion. As far
as I'm concerned, the only sane way to deal with these kinds of formats
is to write the source XML as if it were monolingual, use per-language
XLIFF or PO files to manage translations, and automatically merge the
translations into the published file.

I really only work on the extraction and merging tools, so that's the
only thing I can comment on. Extracting the source strings is simple.
Joining translations into a single file isn't.

I have code in itstool that does this, but it has limitations in what
kinds of formats it can deal with. For reference, here is a source and
a generated file that's part of my regression tests:

http://gitorious.org/itstool/itstool/blobs/master/tests/IT-join-1.xml
http://gitorious.org/itstool/itstool/blobs/master/tests/IT-join-1.ll.xml

In itstool, an element may be a translation unit. A translation unit
generates a message in a PO file. An element is a unit if it is not
within text, and if it contains anything other than other other units
and whitespace-only text nodes.

To join translations into a multilingual file, you have to know which
elements to repeat. In itstool, this is any outermost translation unit.
This basically works for the kinds of documents I've had to deal with.

This makes two big assumptions. First, that the repeatable elements
are the ones I described here. Consider this example:

<application>
  <license xml:lang="en">
    <p>This is the license.</p>
    <p>It has multiple paragraphs.</p>
  </license>
  <license xml:lang="de">
    <p>Dies ist die Lizenz.</p>
    <p>Es verfügt über mehrere Absätze.</p>
  </license>
</application>

Unless I treat license as a unit (and thus, paragraphs as within text),
this doesn't work.

Second, itstool just assumes that the way to create a language-specific
element is by using xml:lang. If the attribute is supposed to be lang
or language or somens:locale_identifier, it won't work.

The targetPointer only addresses identifying multiple-language versions
of an element, given an existing multilingual document. This is about
being able to construct such documents given translations in a normal
translation format.

--
Shaun

Received on Friday, 11 May 2012 19:51:23 UTC