Re: [all] suggestions for consolidating requirements

On Sat, 2012-04-28 at 01:36 +0100, David Lewis wrote:
> Concerning PO files, these haven't really been in the frame for the WG 
> to date, as we've focussed more on XLIFF as the localisation information 
> exchange format. However, PO files seem a significant use case, so it 
> might be helpful if you could provide some pointers to the group on the 
> structure and use of POT and PO files. Any mapping between ITS and PO, 
> as with ITS and XLIFF would probably not be normative, and as with 
> everything, the effort expended will reflect the commitment to provide 
> implementations. But at the same time if we can avoid inadvertently 
> making such a mapping intractable, by a proper understanding of what's 
> involved, we should do so.

I understand that PO files aren't the primary focus. I'm committed
to implementing the new recommendations in itstool, at least those
that make sense for the limited part of the translation process that
it handles. Note that every major Linux distribution ships hundreds
of XML documents translated through PO files, increasingly using an
ITS-based workflow. It's pretty significant, I think.

Here's the basic format of a PO file. PO is a largely line-oriented
format. A PO file is a flat list of messages (or strings or "things
to translate"; the usual word in PO land is "message"). Messages are
not grouped by source as they are in XLIFF. A message usually looks
like this:

#  translator comments
#. extracted comments
#: references
#, flags
msgid "The original message"
msgstr "The translated message"

There can be any number of lines of any of the # lines. Translator
comments are comments from translators to themselves. They should
never be modified by tools.

Extracted comments are written by the message-extraction tool, in
this case itstool. They're also increasingly appended to by tools
used during the translation process, like pofilter. This is where
localization notes are put. It's also where automatic messages
from tools are put. It's a mess, but we do our best to keep some
consistency.

References is a list of files and line numbers where the message
came from. Flags is a list of simple tags that can affect the way
tools behave on a message. For example, the "c-format" flag tells
you the message is a C format string, and localization tools will
validate translations as such. (I'm trying to get "xml-fragment"
into the accepted list of flags.)

msgid is like source in XLIFF. msgstr is like target. There's also
an alternate syntax to handle plural forms ("Selected %i items"),
which we don't touch in itstool.

A message can also have a message context to disambiguate it from
other messages that have the same msgid. (Without context, msgid
is a unique identifier in PO, and messages with the same msgid
are merged.) For example, say your program has "Read" on a button
as a verb for "read this thing" and it has "Read" on a check box
as an adjective for "this thing was already read". These aren't
the same word in most languages, so you have to use msgctxt to
disambiguate:

#. Read this thing
msgctxt "verb"
msgid "Read"

#. This thing has been read
msgctxt "adjective"
msgid "Read"

We allow msgctxt to be set in itstool using itst:contextRule.
I'd have to write another email explaining Mallard to tell you
why it's useful. Also, itstool basically owns the msgctxt "_".
It uses msgctxt "_" for any messages it creates automatically,
such as those from itst:externalRefRule and itst:credits. That
helps avoid conflicts.

It's not a terrible format, though extensibility is hard. There
are things I don't like about PO. There are things I don't like
about XLIFF. End of the day, I'm not doing XML->PO because I
love PO. I'm doing it because PO is the reality in the projects
I work with.

For more details on PO:
http://www.gnu.org/software/gettext/manual/html_node/PO-Files.html#PO-Files

--
Shaun

Received on Sunday, 29 April 2012 18:40:23 UTC