Transliteration-only content from Shaun McCance on 2011-04-26 (public-i18n-its-ig@w3.org from April 2011)

From: Shaun McCance <shaunm@gnome.org>
Date: Tue, 26 Apr 2011 13:14:14 -0400
To: public-i18n-its-ig@w3.org
Message-ID: <1303838054.2137.122.camel@recto>
Hi all,

I was having a private email conversation with Christian Lieske about
some of the work I'm doing, and about ways that transliteration-only
content could be specified in ITS. He suggested I bring the issue to
the list.

Background: I do a lot of writing and documentation tool development
for GNOME (http://www.gnome.org). We have over 100 documents in XML
formats like Mallard and DocBook, and we translate them into over
50 languages.

For the last six years or so, we've been using a custom tool called
xml2po to translate XML documents with PO files. Because GNOME is
built with GNU tools, our translators are used to working with PO
files for translations.

Recently, I've been working on a replacement called ITS Tool:

http://itstool.org/

Like xml2po, it extracts PO messages from XML files, then merges
translations with the source XML to produce localized XML files.
Unlike xml2po, everything it knows about how to handle an XML file
it gets from ITS rules, plus some extension rules.

(By the way, ITS Tool has an extension rule for marking elements
as space-preserving. I notice there's a proposed standard extension
for that. Count me as a +1 for that.)

As I was looking through the PO output, I noticed names of people
being available for translation. Structured document formats often
have a way to provide credits. We could mark them untranslatable,
but I think it's useful to allow transliteration for languages that
use non-Latin scripts. So what I'd like to do is mark these messages
as transliteration-only to make translators' lives easier.

How to put that information in PO files is an open question, and I
need to talk to the GNU developers about it. But on the ITS end,
Christian proposed a syntax for this that I'd like to see people's
comments on.

Below is Christian's proposal. His words exactly, copied and pasted.

==

Proposal: data category for automated language processing

This data category captures information that it is acceptable to create
target language content purely based on automated language processing
(such as automated transliteration, or machine translation). 

Rationale

Some content types, or content consumption scenarios lend themselves to
fully automated language processing. Currently, the corresponding
information cannot be captured.

Proposed Text 

GLOBAL: The autoLanguageProcessingRule element contains the following: 

A required "selector" attribute. It contains an XPath expression which
selects the nodes to which this rule applies. 
A required "process" attribute with the values "transliteration" or
"machineTranslation". 
 
For example: 

<file>
 <its:rules xmlns:its="http://www.w3.org/200x/yy/its" version="2.0">
  <its:autoLanguageProcessingRule process="transliteration"
selector="//name"/>
 </its:rules>
        <credit type="author">
           <name>Shaun</name>
           <email>shaun@example.org</name>
        </credit>
</file> 

LOCAL: The following local markup is available: 

A "autoLanguageProcessing" attribute with the values "transliteration"
or "machineTranslation". 

For example: 

<file>
        <credit type="author">
           <name
its:autoLanguageProcessing="transliteration">Shaun</name>
           <email>shaun@example.org</name>
        </credit>
</file> 

==

Thanks,

Shaun McCance   (twitter @shaunm)
Community Help Expert   |   Open Help Conference
http://syllogist.net/   |   http://openhelpconference.com/
Received on Friday, 29 April 2011 08:52:26 UTC