Thoughts about localization properties, tag set, etc.

Hi all,

Here are some tentative wording on the parts about defining localization properties, etc. and introducing the ideas of XML ITS
within the broader context of localization.

==============================

=== Background

During the localization process of any kind of electronic material tools need two type of information:

The first type of information--referred as "localization properties" in this document--is a general description, for a given format,
of the parts of a document in that format, with regard to localization: what is translatable, what needs resizing, etc. This
information is valid for all documents in the given format. For example: "In Windows RC files, all quoted text within a STRINGTABLE
group are translatable".

The second type of information--referred as "localization directives" in this document--is the set of special instructions inside
each document instance in the given format that:

- provides specific localization information at a level the localization properties cannot (for example: "This run of text, within
this translatable paragraph, should be protected")

- or override the localization properties for a given occurrence (for example: "This specific quoted text in this STRINGTABLE block,
of this RC file, should not be localized").

Most of the time, an application is designed to process a specific format. This means the "localization properties" can often be
hard-coded within the application.

With XML the situation is different. Because all XML document types share the same syntax and parsing rules, it make sense to have
only one application to process all XML documents, regardless of their type. That is, the same application should be able to process
an XHTML, an SVG, and a XSD document, or any other XML document types.

Because each of these XML document types has a distinct vocabulary, one cannot reasonably hard-code the localization properties
needed to process the different documents. The localization properties have to be specified for each document type independently of
the application.

In the same manner, in XML it makes sense to have the concept of localization directives implemented as a single set of tags that
can be shared across all XML documents, regardless of their type. This can be achieved using the Namespace mechanism.


Remark -- It is to be noted that some information about localization could be used for other purpose than localization. For
instance, such metadata could be used for improving accessibility features. A screen reader application could take its cues on what
should be converted from text to voice from the information about what part of the document is translatable.
[[ Not sure where this should go, but it seems something interesting to note somewhere. ]]


=== XML cases

Note -- For demonstration purposes, a imaginary "ITS:" namespace is be used in the following examples. Its elements and attributes
are used only for illustration and do not intend to be a representation of what the tag set should or should not look like.

There are different potential areas in a XML system where internationalization and localization-related information can be used:

1) In a standalone file that defines the generic localization properties of a document type. For example: "The content of the
element <para>" is translatable".

2) Within a document instance there are two potential usages:

2.a) At the top of the document instance, to specify information for the whole document. For example:

<d:doc xmlns:d="myDoc" xmlns:ITS="theITS">
 <ITS:docinfo>
  <ITS:element-default translate="yes"/>
  <ITS:element-exception select="//d:emph[@role='term']" translate="no"/>
 </ITS:docinfo>
 <d:para>Normal text</d:para>
 <d:para>text with <d:emph role="term">term</d:emph>.</d:para>
</d:doc>

2.b) Inside the document instance, at the element level, to complete or override information already specified at a higher level.
For example:

<d:doc xmlns:d="myDoc" xmlns:ITS="theITS">
 <d:para>Normal text</d:para>
 <d:para ITS:translate="no">text that should stay</d:para>
 <d:para>Normal text with <ITS:span translate="no">not translatable parts</ITS:span></d:para>
 <d:para>This text <ITS:span dir="rtl">is in Arabic</ITS:span>.</d:para>
</d:doc>

3) Within a XML schema document there are two potential usages:

3.a) To markup localizable material such as the documentation of the schema. These tag would used for localizing the schema document
itself, like any other XML document. For example:

...<x:enumeration value="idBasedMatch">
 <x:annotation>
  <x:documentation>Indicates the <ITS:term>count units</ITS:term> are matches 
based on ID matches (rather than text matches).</x:documentation>
  </x:annotation>
</x:enumeration>...

3.b) To place along with the definition of the elements and attributes some information on how they should be localized. For
example:

...<element name='para' ITS:translate="yes">
 <complexType mixed='true'>
  <choice minOccurs='0' maxOccurs='unbounded'>
   <element ref='t:emph'/>
   <element ref='t:code'/>
  </choice>
 </complexType>
</element>...

==============================

Cheers,
-yves

Received on Tuesday, 15 March 2005 15:57:09 UTC