W3C home > Mailing lists > Public > public-i18n-its@w3.org > April to June 2006

Change proposals: Christian's "introduction" section

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 07 Apr 2006 08:51:49 +0900
Message-ID: <4435A995.5060207@w3.org>
To: public-i18n-its@w3.org
Hi all,

Due to technical problems I could not integrate Christian's rewrite of
the "introduction" section into the draft. Since time is running, please
have a look at the source text below and comment as soon as possible.

We will discuss this at today's editors call.

- Felix

--><div xmlns="http://www.tei-c.org/ns/1.0"
xml:id="introduction" xsi:schemaLocation="http://www.tei-c.org/ns/1.0


<p> 					<emph>This section is informative.</emph></p>

<p>This document defines a standard for high-quality, cost efficient

internationalization and localization of schemas and XML instances

(both existing ones and new ones). On the one hand, the standard is

defined conceptually through the notion of data categories. On the

other hand, the standard defines implementations of these data categories

as a set of elements and attributes called the Internationalization

Tag Set (ITS). The document provides examples of how ITS can be used

with existing popular markup schemes such as DocBook. Furthermore,

the document provides implementations for three schema languages:

XML DTD             <ptr target="#xml10spec" type="bibref"/>, XML

Schema <ptr target="#xmlschema1" type="bibref"/> and RELAX NG <ptr
target="#relaxng" type="bibref"/>. Feedback related to this document

is especially appreciated on the general concept of ITS and the mechanisms

defined for the selection of ITS-specific information in documents

and schemas, and on the design of the individual data categories.</p>

<p>Requirements for this document are formulated in             <ptr
target="#itsreq" type="bibref"/>. Not all of these requirements are

addressed in this document, for example:</p>

<list type="unordered">

<item> 						<ref
- Indicator of Constraints</ref> </item>

<item> 						<ref

- Handling Entities</ref> </item>

<item> 						<ref
target="http://esw.w3.org/topic/its0908LinguisticMarkup">R023 - Linguistic

Markup</ref> </item>


<p>The Working Group will cover some of the requirements in a separate

document on techniques for internationalization and localization of

schemas and XML  					<hi>documents</hi>.</p>


<head>Users and Usages of ITS</head>


<head>Potential Users of ITS</head>

<p>The ITS specification aims to provide different types of users

with information about what markup should be supported to enable worldwide

use and effective localization of content. Important types of users


<item>schema developers who start a schema from ground up: <p>This

type of user will find proposals for attribute and element names to

be included in their new schema. Using the same names as proposed

here may be helpful because it leads to easier recognition of the

concepts represented by both schema users and processors. It is perfectly

possible, however, for the schema developer to develop their own set

of tag and element names. The specification sets out, first and foremost,

to ensure the required markup is available, and that the behaviour

of that markup meets established needs.</p></item>

<item>schema developers who work with an existing schema:<p>This type

of user will be working with schemassuch as DocBook, DITA, or perhaps

an in-house schema.</p><p>The ITS Working Group has sought input from

people developing widely used formats such as the ones mentioned,

and the specification provides examples of how those specific formats

could be adapted to support ITS.</p><p>Developers working on existing

schemas should check whether their schemas support the markup proposed

in this specification, and, where appropriate, add the markup proposed

here to their schema. </p><p>In some cases, the schema may already

contain markup equivalent to that recommended in ITS.  In this case

it is not necessary to add duplicate markup since ITS provides mechanisms

for relating ITS markup with markup in the host vocabulary which serves

a similar purpose (see <ptr target="#purpose-mapping" type="specref"/>).
The developer should, however, check that the behaviour associated

with the markup in their own schema is fully compatible with the

described in this specification.</p></item>

<item>vendors of content-related tools:<p>This type of users encompasses

companies which provide tools for authoring, translation or other

flavours of content-related software solutions. It is important to

ensure that such tools enable worldwide use and effective localization

of content. For example, translation tools should prevent content

marked up as not for translation from being changed. It is hoped that

the ITS specification will make the job of vendors easier by standardising

the format and processing expectations of certain relevant markup

items, and allowing them to more effectively identify how content

should be handled.</p></item>

<item>knowledge workers:<p>This type of users comprises authors, translators

and other types of content authors. The markup proposed in this

may be used by them to mark up specific bits of content. However,

the burden of inserting markup can sometimes be removed from content

authors by relating the ITS information to relevant bits of content

in a more global manner. This work may fall to information architects,

rather than the content authors themselves.</p></item>

</list>In order to support all of these users, the information about

what markup should be supported to enable worldwide use and effective

localization of content is provided in two ways:</p>


<item>abstract in the data category descriptions<ptr
target="#datacat-description" type="specref"/></item>

<item>concrete in the ITS schemas<ptr target="#its-schemas"



<head>Ways to Use ITS</head>

<p>The ITS specification proposes several mechanisms for supporting

worldwide use and effective localization of content. We will explore

them below. For the purpose of illustration, we will answer the question,

how ITS can indicate that certain parts of content should or should

not be translated.</p>

<p>A content author uses an attribute on a particular element in the

content to say that the text should not be translated</p>


<head>Use of ITS by content author</head>

<egXML xmlns="http://www.tei-c.org/ns/Examples">&lt;book&gt;

 &lt;head&gt;...&lt;/head&gt; &lt;body&gt; ...  &lt;p&gt;And he said:
you need a

 new &lt;quote its:translate="yes"&gt;T-Model&lt;/quote&gt;&lt;/p&gt; ...




<p>A content author or information architect uses markup at the top

of the document to identify a particular type of element or context

in which the content should not be translated.</p>


<head>Use of ITS by information architect</head>

<egXML xmlns="http://www.tei-c.org/ns/Examples">&lt;text&gt;


 &lt;its:rules xmlns:its="http://www.w3.org/2005/11/its"&gt;

  &lt;its:translateRule its:translate="yes" its:translateSelector="//p"/&gt;



 &lt;body&gt; ...

  &lt;p&gt; ...




<p>A processor may inject markup at the top of the document which

links to ITS information outside of the document.</p>


<head>Use of ITS by automated process</head>

<egXML xmlns="http://www.tei-c.org/ns/Examples">&lt;text&gt;


 &lt;its:rules xmlns:its="http://www.w3.org/2005/11/its"&gt;

  &lt;its:rulesLink xlink:href="someUri"/&gt;



 &lt;body&gt; ...

  &lt;p&gt; ...




<p>A schema developer uses constructs in the schema itself to indicate

that specific parts of the content should not be translated.</p>


<head>Use of ITS by schema developer</head>

<egXML xmlns="http://www.tei-c.org/ns/Examples">TODO</egXML>


<p>The first two approaches above can be likened to the use of CSS

in XHTML. Using a style attribute, an XHTML content author may assign

a colour to a particular paragraph. That author could also have used

the style element at the top of the page to say that all paragraphs

of a particular class or in a particular context would be coloured






<head>Motivation for ITS</head>

<p>Content or software that is authored in one language (so-called

source language) is often made available in additional languages or

adapted with regard to other cultural aspects. This is done through

a process called localization, where the original material is translated

and adapted to the target audience.</p>


<p>In addition, document formats expressed by schemas may be used

by people in different parts of the world, and these people may need

special markup to support the local language or script.  For example,

people authoring in languages such as Arabic, Hebrew, Persian or Urdu

need special markup to demarcate directionality in mixed direction



<p>From the viewpoints of feasibility, cost, and efficiency, it is

important that the original material should be suitable for localization.

This is achieved by appropriate design and development, and the

process is referred to as internationalization. For a detailed explanation

of the terms "localization" and  "internationalization", see <ptr
target="#geo-i18n-l10n" type="bibref"/>.</p>

<p>The increasing usage of XML as a medium for documentation-related

content (e.g. DocBook, a format for writing structured documentation,

well suited to computer hardware and software manuals) and software-related

content (e.g. the eXtensible User Interface Language <ptr target="#xul"
type="bibref"/>) creates challenges and opportunities in the domain

of XML internationalization and localization.</p>

<p><?Pub Dtl?>The following examples sketch one of the issues that

currently hinder efficient XML-related localization: the lack of a

standard, declarative mechanism which identifies which parts of an

XML <hi>document</hi> need to be translated (the <hi
rend="localizable">text in bold face</hi> shows the parts that need

to be localized). Tools often cannot automatically do this

<exemplum><?Pub Dtl?>

<head>Document with partially localizable content</head>

<p> 							<gi>PhaseCode</gi> should not be translated;

the <att>title</att> attribute sometimes has to be translated and

sometimes must not be translated.</p>

<egXML xmlns="http://www.tei-c.org/ns/Examples">&lt;Manual&gt;


  &lt;PhaseCode&gt;Review Level&lt;/PhaseCode&gt;


  &lt;Name&gt;<hi rend="localizable">Owner's Manual</hi>&lt;/Name&gt;



 &lt;Section id="0" title="#Introduction#"&gt;

  &lt;Ltitle id="005" title="#ZOOM#"&gt;

   &lt;Mtitle id="00501" title="<hi rend="localizable">Getting
started</hi>" option="no" cols="1"&gt;

    &lt;MultiCol cols="1"&gt;

     &lt;Text&gt;<hi rend="localizable">Some text to







<exemplum><?Pub Dtl?>

<head>Document with partially localizable content</head>

<p>The first file name in the first <gi>component</gi> element would

not be               translated.</p>

<egXML xmlns="http://www.tei-c.org/ns/Examples">&lt;dialogue

 &lt;rsrc id="123"&gt;

  &lt;component id="456" type="image"&gt;

   &lt;data type="text"&gt;images/cancel.gif&lt;/data&gt;

   &lt;data type="coordinates"&gt;12,20,50,14&lt;/data&gt;


  &lt;component id="789" type="caption"&gt;

   &lt;data type="text"&gt;<hi rend="localizable">Cancel</hi>&lt;/data&gt;

   &lt;data type="coordinates"&gt;12,34,50,14&lt;/data&gt;






<head>Document with partially localizable content</head>

<p>In the example below, there are no clear mechanism allowing one

to know which <gi>string</gi> element needs to be translated.</p>

<egXML xmlns="http://www.tei-c.org/ns/Examples">&lt;resources&gt;

 &lt;section id="Homepage"&gt;







   &lt;string&gt;<hi rend="localizable">Corporate Policy</hi>&lt;/string&gt;




   &lt;string&gt;<hi rend="localizable">ABC Corporation - Policy


   &lt;string&gt;<hi rend="localizable">Pages</hi>&lt;/string&gt;




   &lt;string&gt;<hi rend="localizable">List of Available







<head>Out of Scope</head>

<p>This standard does not exhaustively cover all mechanisms and data

formats which might  be needed for configuring localization workflows

or tools to process a specific format. These mechanisms and data formats,

sometimes called <term>Localization Properties</term>, however, possibly

may be implemented by the framework put forth in this standard (see

in particular <ptr target="#selection" type="specref"/>)
documents.<note>窶弭ML localization properties" is a generic term
to name the mechanisms

and data formats that allows localization tools to be configured in

order to process a specific XML format. Examples of "XML localization

properties" are: the "Trados DTD Settings" file, the SDLX "Analysis"



<div xml:id="design-decisions">

<head>Important Design Principles</head>

<note> 						<p>Attention: The design

of the ITS schema in <ptr target="#selection" type="specref"/> and <ptr
target="#datacat-description" type="specref"/> is still under development.

Nevertheless, the working group does not intend to change large parts

of the element and attribute names and their functionality, but only

the ITS schema structure.</p></note>

<p>Abstraction via <emph>data categories</emph>: ITS defines data

categories as an abstract notion for  information for internationalization

and localization of XML schemas and documents. This abstraction is

helpful in realizing independence from a particular implementation

e.g. using an element or attribute. See               <ptr
target="#def-datacat" type="specref"/> for a definition of the term

data categories, <ptr target="#datacat-description" type="specref"/> for
the definition of  the various ITS data categories, and <hi>subsections
in <ptr target="#datacat-description" type="specref"/></hi>				for the
data category


<p> Powerful						<emph>selection mechanism:</emph>For any ITS markup
which appears in an XML instance or XML schema,

it has to be clearly defined to which XML  nodes the ITS-related information

pertains. Thus, ITS specifies selection as a mechanism to specify

to what parts of an XML document or schema an ITS data category and

its values should be applied.</p>

<p>Content authors need for example a simple way to work with the<ref
target="#translate">translatability data category</ref> in order to

express  whether the content of an element or attribute should be

translated or not. On the other hand, for translations of large document

sets based on the same schema, a specification of defaults for

and exceptions from the defaults is important (e.g. all <gi>p</gi>
elements should be translated, but not <gi>p</gi> elements inside

of an <gi>index</gi> element). </p>

<p>This specification responds to these requirements by introducing

mechanisms for specifying             ITS information in XML documents,
see <ptr target="#selection" type="specref"/>. These mechanisms also
provide a means for specifying ITS information

for attributes (a task for which no standard means yet exists). The

ITS mechanisms for selection are:</p>

<list type="unordered">

<item>as for XML <hi>documentss</hi>, useable <ref
target="#selection-local">local</ref> (at the XML node to which it

pertains) or <ref target="#selection-global">globally</ref> (not at

the XML node to which it pertains)</item>

<item>as for global usage: possibly in the target XML <hi>document</hi>
or in a separate file</item>


<p> 						<emph>No dedicated extensibility</emph>: It may be useful or
necessary to extend the set of information

available for internationalization or localization purposes beyond

what is provided by ITS. This specification does not define a dedicated

extension mechanism, since ordinary XML mechanisms (e.g. XML Namespaces
<ptr target="#xmlns" type="bibref"/>) may be used.</p>

<p> 						<emph>Ease of integration</emph>:</p>

<list type="unordered">

<item> ITS follows the example from <ref
target="http://www.w3.org/TR/xlink11/#att-method">section 4</ref> of
<ptr target="#xlink11" type="bibref"/>, by providing mostly global

for the implementation of ITS data categories. Avoiding elements for

ITS purposes as much as possible ensures ease of integration into

existing markup schemes, see <ref
target="http://www.w3.org/TR/itsreq/#impact">section 3.14</ref> in <ptr
target="#itsreq" type="bibref"/>. Only for some requirements additional

child elements have to be used, see for example <ptr target="#ruby-sec"

<item>ITS has no dependency on technologies which are yet to be

<item>ITS fits with existing work in the W3C architecture (e.g. use

of XPath <ptr target="#xpath10" type="bibref"/>for the selection



<div xml:id="tei-ref">

<head>Development of this Specification</head>

<p>This specification has been developed using the ODD (<emph>One

Document Does it all</emph>) language of the Text Encoding Initiative

(<ptr target="#tei" type="bibref"/>). This is a literate programming

language for writing XML schemas, with three characteristics: <list

<item>The element and attribute set is specified using an XML vocabulary

which includes support for macros (like DTD entities, or schema patterns),

a hierarchical class system for attributes and elements, and creation

of modules.</item>

<item>The content models for elements and attributes is written using

embedded RELAX NG XML notation.</item>

<item>Documentation for elements, attributes, value lists etc is written

inline, along with examples and other supporting material.</item>

</list> XSLT transform are provided by the TEI to extract documentation

in HTML, XSL FO or LaTeX forms, and to generate RELAX NG documents

and DTD. From the RELAX NG documents, James Clark's <ref
<?Pub Caret?>can be used to create XML Schema documents.</p>


</div><?Pub *0000019187 0?>

Received on Thursday, 6 April 2006 23:52:14 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:07 UTC