W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > November 2008

RE: [Bug 6245] SML locid example request from ITS Interest Group

From: Yves Savourel <ysavourel@translate.com>
Date: Thu, 20 Nov 2008 13:52:03 -0700
To: <bugzilla@wiggum.w3.org>
CC: <public-i18n-its-ig@w3.org>
Message-ID: <003301c94b51$d72cee50$9f05a8c0@BREIZH>

Hi John,

> - Is the XML you supplied, especially the unqualified 
> "messages" and "msg" elements, part of ITS, some other 
> recognized and adopted standard, or any industry practice 
> with demonstrable public domain adoption?

The example shows a generic imaginary XML file. There is no ITS-specific markup in it. It does implements some of the XML
internationalization best practices: use of xml:lang to identify the language and use of a unique ID for each message.

So, while this specific document instance is not in a standard vocabulary, it is in XML which is a recognized and widely adopted
standard. And this is our main point: XML (any XML) is often the best choice to store XML content (rather than properties files).


> - Same question for the putative ITS version you allude to.

ITS is only a set of attributes and tag one can add on top of an existing XML document, not a format you could use directly to store
strings. You can see an example of such markup in the recent SVG-Tiny PR document:
http://www.w3.org/TR/SVGTiny12/i18n.html#SVGi18nl10nmarkup (its.svg).



> The existing examples were built based on code in one 
> of the existing known SML implementations, based on 
> the Java resource bundle concept (I'm not sure which 
> category above this falls into, but at the minimum it 
> is one with broad industry adoption amongst Java apps).

And we certainly don't see much wrong with it. Except that it could be done in a way that is more flexible for localization. We
would see no problem in keeping that current example.

One side note on your existing example: the files seem to use a naming convention that is not quite the recommended one: The locale
codes should be suffixes rather than prefixes. For instance it should be lang_fr.txt rather than fr_lang.txt. The names of
properties file is important as their pattern is hard-wired in Java classes such as java.util.ResourceBundle (see the getBundle()
method for example).



> Please note also that the appendix in question is exemplary,
> not normative or limiting.

Yes, but I also think it is quite important to convey best practices in examples, as they are often the references many developers
use to design and code their own implementations by default. In a sense, appendices like this are the place where the broad
community is taking its clue from.



> To the degree that either alternative you are suggesting has 
> demonstrable public adoption or prescribes a format agreed to 
> by a broad community I expect that will help make the case 
> for adding it/them as additional examples. I am less sanguine 
> about the prospects for removing the existing example ("we suggest 
> to replace this section"), since it is based on implementation 
> experience.

I understand these valid concerns.
At the same time, you may want to take in account the following:

-1) With regards to ITS: Some of the recommendations the W3C produces break new grounds and, initially, are not adopted by a broad
community. While there are various ways to promote such specifications, one important conduit is the other W3C specifications where
users can see examples and get exposed to them. It is especially true for specification like ITS which are more 'add-ons' than
full-blown XML applications addressing a specific domain. Think of ITS (for example the its:translate attribute) as something akin
to xml:lang.

-2) With regards to XML vs Properties: For many reasons, from the localization viewpoint, translatable data (and most especially
those with XML tags) are, in general, best stored in XML than in other formats. For example:

a) Encoding is clearly addressed and easily handled in XML (no \uHHHH escaping, much less chances to lose or corrupt non-ASCII
characters).

b) You can use many generic purpose XML tools to work with XML files: for example one could open an XML resource file in an XML
editor and spell-check the translated text, or do grammar checking, or perform a word-count, or use an XSLT template to display it
in a user friendly way for review, etc.

c) You can easily have the storage format evolve over time without changing its core or the tools that use it. For example you can
add/remove attributes useful for the translation process workflow.

d) If the data contain XML tags (like your example). Most XML-enabled tools will be able to "see" them as tags part of the content
and protect them accordingly. If the same data is in a different storage format (like a properties file) most translation tools will
treat the inline tags as text, exposing them to accidental modifications that can end up in invalid data at runtime.

e) XML documents have now an internationalization set of tags (ITS) that can be used to provide a lot of internationalization and
localization-related features in a standard way, facilitating the localization workflow.


All this is true, independently of SML and any implementations of SML. Obviously, you always have to weigh the pros and cons of any
solution, and in this occurrence some applications may find the better choice to be simple properties files. But I think it would
make sense to also show an example of what we think is a better practice.

Maybe an alternative to replacing the existing example could be to add one with an XML file. Something similar to the following,
that would go just above the "Variable substitution support" title:

=====

Translatable messages, especially strings containing XML tags (like <sch:value-of select="string(u:ID)"/> in this example), may be
best stored in XML containers. This allows more flexibility to manipulate and translate the data. For example, the XML document
could utilize ITS to add localization-related information.

<?xml version="1.0" encoding="UTF-8"/>
<messages xml:lang="en"
 xmlns:sch="http://purl.oclc.org/dsdl/schematron"
 xmlns:its="http://www.w3.org/2005/11/its" >
 <msg xml:id='StudentIDErrorMsg'
  its:locNote="This message should not be longer than 128 characters">The specified ID <sch:value-of select="string(u:ID)"/> does
not begin with 99.</msg>
</messages>

=====

Cheers,
-yves
Received on Thursday, 20 November 2008 20:52:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 18:33:00 GMT