RE: Transliteration-only content from Shaun McCance on 2011-05-01 (public-i18n-its-ig@w3.org from May 2011)

From: Shaun McCance <shaunm@gnome.org>
Date: Sun, 01 May 2011 11:53:47 -0400
To: Yves Savourel <ysavourel@translate.com>
Cc: public-i18n-its-ig@w3.org
Message-ID: <1304265227.2137.209.camel@recto>
On Sun, 2011-05-01 at 07:44 -0600, Yves Savourel wrote:
> Hello Shaun,
> 
> It's good to see the work you are doing with Mallard and ITS.
> 
> I did look at Christian's proposal for autoLanguageProcessingRule and
> it looks like a sensible way to specify some of the action that need
> to be perform on the source document. I can see how this would be used
> in your context. But there are two aspects where I'm not sure about:
> 
> === a) What to do with the output?
> 
> I know it's not really ITS' problem: ITS job is to identify the nodes
> that need to be transliterated and stops there. But from a practical
> viewpoint, how this information can be carried on to the next step?
> Like you say there is no way currently to represent it in PO. There is
> none in XLIFF either, or any translation format that I know of.

PO files have fields for flags:

http://www.gnu.org/software/gettext/manual/html_node/PO-Files.html

I think this is the best way. Unfortunately, gettext and its various
tools don't play nicely with flags outside its predefined set. If you
run msgmerge, for example, all non-gettext flags disappear. This is
something I'll have to talk to the gettext developers about.

> This shouldn't stop us to have the ITS data category. I'm just
> wondering how used it'll be since 
> 
> === b) What processing expectation is attached to this?
> 
> I wonder about the semantic attached to the values 'transliteration'
> and 'machineTranslation'. Maybe you or Christian have already some
> precise idea.
> 
> For example: is that means such content must be *only* transliterated
> for example? Then should it be marked also as translate='no'? If the
> next step provides MT capability, should any content marked as
> 'transliteration' be kept only that, even if there is a way to do an
> actual translation for it?
> 
> In other word I'm wondering about what processing people will attach
> to those labels?

Well, if you ask each of GNOME's 80+ translation teams about how
they manage the process, I'm sure you'll get a dozen different
answers. I don't usually get involved in the actual translation
steps. I just listen to what our translators say.

I don't really know that it's feasible to machine-transliterate
names of people. In fact, human-transliteration of human names
is hard. Names are pronounced in all sorts of weird ways, and
that can affect transliteration. Translation memory can help,
I'm sure.

What can reliably be done by machine, though, is nothing. That
is, translators from the many languages that don't transliterate
could just run a script that marks all transliterate messages
translated by copying the msgid to the msgstr.

> Maybe more importantly, it seems also that both values would apply to
> 'translatable' text. They really look like additional qualifiers to a
> translatable content. I would imagine they could also be just a new
> optional attribute in the translateRule data category.

That makes sense to me.

Thanks,

Shaun McCance
Community Help Expert   |   Open Help Conference
http://syllogist.net/   |   http://openhelpconference.com/
Received on Sunday, 1 May 2011 15:46:37 UTC