- From: Shaun McCance <shaunm@gnome.org>
- Date: Sun, 29 Apr 2012 14:39:58 -0400
- To: public-multilingualweb-lt@w3.org
On Sat, 2012-04-28 at 01:36 +0100, David Lewis wrote: > Concerning PO files, these haven't really been in the frame for the WG > to date, as we've focussed more on XLIFF as the localisation information > exchange format. However, PO files seem a significant use case, so it > might be helpful if you could provide some pointers to the group on the > structure and use of POT and PO files. Any mapping between ITS and PO, > as with ITS and XLIFF would probably not be normative, and as with > everything, the effort expended will reflect the commitment to provide > implementations. But at the same time if we can avoid inadvertently > making such a mapping intractable, by a proper understanding of what's > involved, we should do so. I understand that PO files aren't the primary focus. I'm committed to implementing the new recommendations in itstool, at least those that make sense for the limited part of the translation process that it handles. Note that every major Linux distribution ships hundreds of XML documents translated through PO files, increasingly using an ITS-based workflow. It's pretty significant, I think. Here's the basic format of a PO file. PO is a largely line-oriented format. A PO file is a flat list of messages (or strings or "things to translate"; the usual word in PO land is "message"). Messages are not grouped by source as they are in XLIFF. A message usually looks like this: # translator comments #. extracted comments #: references #, flags msgid "The original message" msgstr "The translated message" There can be any number of lines of any of the # lines. Translator comments are comments from translators to themselves. They should never be modified by tools. Extracted comments are written by the message-extraction tool, in this case itstool. They're also increasingly appended to by tools used during the translation process, like pofilter. This is where localization notes are put. It's also where automatic messages from tools are put. It's a mess, but we do our best to keep some consistency. References is a list of files and line numbers where the message came from. Flags is a list of simple tags that can affect the way tools behave on a message. For example, the "c-format" flag tells you the message is a C format string, and localization tools will validate translations as such. (I'm trying to get "xml-fragment" into the accepted list of flags.) msgid is like source in XLIFF. msgstr is like target. There's also an alternate syntax to handle plural forms ("Selected %i items"), which we don't touch in itstool. A message can also have a message context to disambiguate it from other messages that have the same msgid. (Without context, msgid is a unique identifier in PO, and messages with the same msgid are merged.) For example, say your program has "Read" on a button as a verb for "read this thing" and it has "Read" on a check box as an adjective for "this thing was already read". These aren't the same word in most languages, so you have to use msgctxt to disambiguate: #. Read this thing msgctxt "verb" msgid "Read" #. This thing has been read msgctxt "adjective" msgid "Read" We allow msgctxt to be set in itstool using itst:contextRule. I'd have to write another email explaining Mallard to tell you why it's useful. Also, itstool basically owns the msgctxt "_". It uses msgctxt "_" for any messages it creates automatically, such as those from itst:externalRefRule and itst:credits. That helps avoid conflicts. It's not a terrible format, though extensibility is hard. There are things I don't like about PO. There are things I don't like about XLIFF. End of the day, I'm not doing XML->PO because I love PO. I'm doing it because PO is the reality in the projects I work with. For more details on PO: http://www.gnu.org/software/gettext/manual/html_node/PO-Files.html#PO-Files -- Shaun
Received on Sunday, 29 April 2012 18:40:23 UTC