W3C home > Mailing lists > Public > public-sml@w3.org > November 2008

[Bug 6245] SML locid example request from ITS Interest Group

From: <bugzilla@wiggum.w3.org>
Date: Thu, 20 Nov 2008 22:49:05 +0000
To: public-sml@w3.org
Message-Id: <E1L3IKf-0000uZ-FS@farnsworth.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=6245





--- Comment #2 from Yves Savourel <ysavourel@translate.com>  2008-11-20 22:49:05 ---
Hi John,

> - Is the XML you supplied, especially the unqualified "messages" and 
> "msg" elements, part of ITS, some other recognized and adopted 
> standard, or any industry practice with demonstrable public domain 
> adoption?

The example shows a generic imaginary XML file. There is no ITS-specific markup
in it. It does implements some of the XML internationalization best practices:
use of xml:lang to identify the language and use of a unique ID for each
message.

So, while this specific document instance is not in a standard vocabulary, it
is in XML which is a recognized and widely adopted standard. And this is our
main point: XML (any XML) is often the best choice to store XML content (rather
than properties files).


> - Same question for the putative ITS version you allude to.

ITS is only a set of attributes and tag one can add on top of an existing XML
document, not a format you could use directly to store strings. You can see an
example of such markup in the recent SVG-Tiny PR document:
http://www.w3.org/TR/SVGTiny12/i18n.html#SVGi18nl10nmarkup (its.svg).



> The existing examples were built based on code in one of the existing 
> known SML implementations, based on the Java resource bundle concept 
> (I'm not sure which category above this falls into, but at the minimum 
> it is one with broad industry adoption amongst Java apps).

And we certainly don't see much wrong with it. Except that it could be done in
a way that is more flexible for localization. We would see no problem in
keeping that current example.

One side note on your existing example: the files seem to use a naming
convention that is not quite the recommended one: The locale codes should be
suffixes rather than prefixes. For instance it should be lang_fr.txt rather
than fr_lang.txt. The names of properties file is important as their pattern is
hard-wired in Java classes such as java.util.ResourceBundle (see the
getBundle() method for example).



> Please note also that the appendix in question is exemplary, not 
> normative or limiting.

Yes, but I also think it is quite important to convey best practices in
examples, as they are often the references many developers use to design and
code their own implementations by default. In a sense, appendices like this are
the place where the broad community is taking its clue from.



> To the degree that either alternative you are suggesting has 
> demonstrable public adoption or prescribes a format agreed to by a 
> broad community I expect that will help make the case for adding 
> it/them as additional examples. I am less sanguine about the prospects 
> for removing the existing example ("we suggest to replace this 
> section"), since it is based on implementation experience.

I understand these valid concerns.
At the same time, you may want to take in account the following:

-1) With regards to ITS: Some of the recommendations the W3C produces break new
grounds and, initially, are not adopted by a broad community. While there are
various ways to promote such specifications, one important conduit is the other
W3C specifications where users can see examples and get exposed to them. It is
especially true for specification like ITS which are more 'add-ons' than
full-blown XML applications addressing a specific domain. Think of ITS (for
example the its:translate attribute) as something akin to xml:lang.

-2) With regards to XML vs Properties: For many reasons, from the localization
viewpoint, translatable data (and most especially those with XML tags) are, in
general, best stored in XML than in other formats. For example:

a) Encoding is clearly addressed and easily handled in XML (no \uHHHH escaping,
much less chances to lose or corrupt non-ASCII characters).

b) You can use many generic purpose XML tools to work with XML files: for
example one could open an XML resource file in an XML editor and spell-check
the translated text, or do grammar checking, or perform a word-count, or use an
XSLT template to display it in a user friendly way for review, etc.

c) You can easily have the storage format evolve over time without changing its
core or the tools that use it. For example you can add/remove attributes useful
for the translation process workflow.

d) If the data contain XML tags (like your example). Most XML-enabled tools
will be able to "see" them as tags part of the content and protect them
accordingly. If the same data is in a different storage format (like a
properties file) most translation tools will treat the inline tags as text,
exposing them to accidental modifications that can end up in invalid data at
runtime.

e) XML documents have now an internationalization set of tags (ITS) that can be
used to provide a lot of internationalization and localization-related features
in a standard way, facilitating the localization workflow.


All this is true, independently of SML and any implementations of SML.
Obviously, you always have to weigh the pros and cons of any solution, and in
this occurrence some applications may find the better choice to be simple
properties files. But I think it would make sense to also show an example of
what we think is a better practice.

Maybe an alternative to replacing the existing example could be to add one with
an XML file. Something similar to the following, that would go just above the
"Variable substitution support" title:

=====

Translatable messages, especially strings containing XML tags (like
<sch:value-of select="string(u:ID)"/> in this example), may be best stored in
XML containers. This allows more flexibility to manipulate and translate the
data. For example, the XML document could utilize ITS to add
localization-related information.

<?xml version="1.0" encoding="UTF-8"/>
<messages xml:lang="en"
 xmlns:sch="http://purl.oclc.org/dsdl/schematron"
 xmlns:its="http://www.w3.org/2005/11/its" >  <msg xml:id='StudentIDErrorMsg'
  its:locNote="This message should not be longer than 128 characters">The
specified ID <sch:value-of select="string(u:ID)"/> does not begin with
99.</msg> </messages>

=====

cheers, -ys


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 20 November 2008 22:49:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 20 November 2008 22:49:17 GMT