- From: Felix Sasaki <fsasaki@w3.org>
- Date: Fri, 07 Apr 2006 08:51:49 +0900
- To: public-i18n-its@w3.org
- Message-ID: <4435A995.5060207@w3.org>
Hi all,
Due to technical problems I could not integrate Christian's rewrite of
the "introduction" section into the draft. Since time is running, please
have a look at the source text below and comment as soon as possible.
We will discuss this at today's editors call.
- Felix
--><div xmlns="http://www.tei-c.org/ns/1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xml:id="introduction" xsi:schemaLocation="http://www.tei-c.org/ns/1.0
C:\DOCUME~1\d025418\MYDOCU~1\MYPROJ~1\ITS\its-edit\odd.xsd">
<head>Introduction</head>
<p> <emph>This section is informative.</emph></p>
<p>This document defines a standard for high-quality, cost efficient
internationalization and localization of schemas and XML instances
(both existing ones and new ones). On the one hand, the standard is
defined conceptually through the notion of data categories. On the
other hand, the standard defines implementations of these data categories
as a set of elements and attributes called the Internationalization
Tag Set (ITS). The document provides examples of how ITS can be used
with existing popular markup schemes such as DocBook. Furthermore,
the document provides implementations for three schema languages:
XML DTD <ptr target="#xml10spec" type="bibref"/>, XML
Schema <ptr target="#xmlschema1" type="bibref"/> and RELAX NG <ptr
target="#relaxng" type="bibref"/>. Feedback related to this document
is especially appreciated on the general concept of ITS and the mechanisms
defined for the selection of ITS-specific information in documents
and schemas, and on the design of the individual data categories.</p>
<p>Requirements for this document are formulated in <ptr
target="#itsreq" type="bibref"/>. Not all of these requirements are
addressed in this document, for example:</p>
<list type="unordered">
<item> <ref
target="http://www.w3.org/TR/2005/WD-itsreq-20050805/#constraints">R001
- Indicator of Constraints</ref> </item>
<item> <ref
target="http://www.w3.org/TR/2005/WD-itsreq-20050805/#entities">R005
- Handling Entities</ref> </item>
<item> <ref
target="http://esw.w3.org/topic/its0908LinguisticMarkup">R023 - Linguistic
Markup</ref> </item>
</list>
<p>The Working Group will cover some of the requirements in a separate
document on techniques for internationalization and localization of
schemas and XML <hi>documents</hi>.</p>
<div>
<head>Users and Usages of ITS</head>
<div>
<head>Potential Users of ITS</head>
<p>The ITS specification aims to provide different types of users
with information about what markup should be supported to enable worldwide
use and effective localization of content. Important types of users
are<list>
<item>schema developers who start a schema from ground up: <p>This
type of user will find proposals for attribute and element names to
be included in their new schema. Using the same names as proposed
here may be helpful because it leads to easier recognition of the
concepts represented by both schema users and processors. It is perfectly
possible, however, for the schema developer to develop their own set
of tag and element names. The specification sets out, first and foremost,
to ensure the required markup is available, and that the behaviour
of that markup meets established needs.</p></item>
<item>schema developers who work with an existing schema:<p>This type
of user will be working with schemassuch as DocBook, DITA, or perhaps
an in-house schema.</p><p>The ITS Working Group has sought input from
people developing widely used formats such as the ones mentioned,
and the specification provides examples of how those specific formats
could be adapted to support ITS.</p><p>Developers working on existing
schemas should check whether their schemas support the markup proposed
in this specification, and, where appropriate, add the markup proposed
here to their schema. </p><p>In some cases, the schema may already
contain markup equivalent to that recommended in ITS. In this case
it is not necessary to add duplicate markup since ITS provides mechanisms
for relating ITS markup with markup in the host vocabulary which serves
a similar purpose (see <ptr target="#purpose-mapping" type="specref"/>).
The developer should, however, check that the behaviour associated
with the markup in their own schema is fully compatible with the
expectations
described in this specification.</p></item>
<item>vendors of content-related tools:<p>This type of users encompasses
companies which provide tools for authoring, translation or other
flavours of content-related software solutions. It is important to
ensure that such tools enable worldwide use and effective localization
of content. For example, translation tools should prevent content
marked up as not for translation from being changed. It is hoped that
the ITS specification will make the job of vendors easier by standardising
the format and processing expectations of certain relevant markup
items, and allowing them to more effectively identify how content
should be handled.</p></item>
<item>knowledge workers:<p>This type of users comprises authors, translators
and other types of content authors. The markup proposed in this
specification
may be used by them to mark up specific bits of content. However,
the burden of inserting markup can sometimes be removed from content
authors by relating the ITS information to relevant bits of content
in a more global manner. This work may fall to information architects,
rather than the content authors themselves.</p></item>
</list>In order to support all of these users, the information about
what markup should be supported to enable worldwide use and effective
localization of content is provided in two ways:</p>
<list>
<item>abstract in the data category descriptions<ptr
target="#datacat-description" type="specref"/></item>
<item>concrete in the ITS schemas<ptr target="#its-schemas"
type="specref"/></item>
</list>
<div>
<head>Ways to Use ITS</head>
<p>The ITS specification proposes several mechanisms for supporting
worldwide use and effective localization of content. We will explore
them below. For the purpose of illustration, we will answer the question,
how ITS can indicate that certain parts of content should or should
not be translated.</p>
<p>A content author uses an attribute on a particular element in the
content to say that the text should not be translated</p>
<exemplum>
<head>Use of ITS by content author</head>
<egXML xmlns="http://www.tei-c.org/ns/Examples"><book>
<head>...</head> <body> ... <p>And he said:
you need a
new <quote its:translate="yes">T-Model</quote></p> ...
</body>
</book></egXML>
</exemplum>
<p>A content author or information architect uses markup at the top
of the document to identify a particular type of element or context
in which the content should not be translated.</p>
<exemplum>
<head>Use of ITS by information architect</head>
<egXML xmlns="http://www.tei-c.org/ns/Examples"><text>
<head>
<its:rules xmlns:its="http://www.w3.org/2005/11/its">
<its:translateRule its:translate="yes" its:translateSelector="//p"/>
<its:rules>
</head>
<body> ...
<p> ...
<dl><dt>...</dt><dd>...</dd></dl></p>
</body>
</text></egXML>
</exemplum>
<p>A processor may inject markup at the top of the document which
links to ITS information outside of the document.</p>
<exemplum>
<head>Use of ITS by automated process</head>
<egXML xmlns="http://www.tei-c.org/ns/Examples"><text>
<head>
<its:rules xmlns:its="http://www.w3.org/2005/11/its">
<its:rulesLink xlink:href="someUri"/>
<its:rules>
</head>
<body> ...
<p> ...
<dl><dt>...</dt><dd>...</dd></dl></p>
</body>
</text></egXML>
</exemplum>
<p>A schema developer uses constructs in the schema itself to indicate
that specific parts of the content should not be translated.</p>
<exemplum>
<head>Use of ITS by schema developer</head>
<egXML xmlns="http://www.tei-c.org/ns/Examples">TODO</egXML>
</exemplum>
<p>The first two approaches above can be likened to the use of CSS
in XHTML. Using a style attribute, an XHTML content author may assign
a colour to a particular paragraph. That author could also have used
the style element at the top of the page to say that all paragraphs
of a particular class or in a particular context would be coloured
red.</p>
</div>
</div>
</div>
<div>
<head>Motivation for ITS</head>
<p>Content or software that is authored in one language (so-called
source language) is often made available in additional languages or
adapted with regard to other cultural aspects. This is done through
a process called localization, where the original material is translated
and adapted to the target audience.</p>
<div>
<p>In addition, document formats expressed by schemas may be used
by people in different parts of the world, and these people may need
special markup to support the local language or script. For example,
people authoring in languages such as Arabic, Hebrew, Persian or Urdu
need special markup to demarcate directionality in mixed direction
text.</p>
</div>
<p>From the viewpoints of feasibility, cost, and efficiency, it is
important that the original material should be suitable for localization.
This is achieved by appropriate design and development, and the
corresponding
process is referred to as internationalization. For a detailed explanation
of the terms "localization" and "internationalization", see <ptr
target="#geo-i18n-l10n" type="bibref"/>.</p>
<p>The increasing usage of XML as a medium for documentation-related
content (e.g. DocBook, a format for writing structured documentation,
well suited to computer hardware and software manuals) and software-related
content (e.g. the eXtensible User Interface Language <ptr target="#xul"
type="bibref"/>) creates challenges and opportunities in the domain
of XML internationalization and localization.</p>
<p><?Pub Dtl?>The following examples sketch one of the issues that
currently hinder efficient XML-related localization: the lack of a
standard, declarative mechanism which identifies which parts of an
XML <hi>document</hi> need to be translated (the <hi
rend="localizable">text in bold face</hi> shows the parts that need
to be localized). Tools often cannot automatically do this
identification.</p>
<exemplum><?Pub Dtl?>
<head>Document with partially localizable content</head>
<p> <gi>PhaseCode</gi> should not be translated;
the <att>title</att> attribute sometimes has to be translated and
sometimes must not be translated.</p>
<egXML xmlns="http://www.tei-c.org/ns/Examples"><Manual>
<Info>
<PhaseCode>Review Level</PhaseCode>
<FormNo>8U81-GS-52C</FormNo>
<Name><hi rend="localizable">Owner's Manual</hi></Name>
...
</Info>
<Section id="0" title="#Introduction#">
<Ltitle id="005" title="#ZOOM#">
<Mtitle id="00501" title="<hi rend="localizable">Getting
started</hi>" option="no" cols="1">
<MultiCol cols="1">
<Text><hi rend="localizable">Some text to
localize</hi></Text>
...
</Multicol>
</Mtitle>
</Ltitle>...
</Manual></egXML>
</exemplum>
<exemplum><?Pub Dtl?>
<head>Document with partially localizable content</head>
<p>The first file name in the first <gi>component</gi> element would
not be translated.</p>
<egXML xmlns="http://www.tei-c.org/ns/Examples"><dialogue
xml:lang="en-gb">
<rsrc id="123">
<component id="456" type="image">
<data type="text">images/cancel.gif</data>
<data type="coordinates">12,20,50,14</data>
</component>
<component id="789" type="caption">
<data type="text"><hi rend="localizable">Cancel</hi></data>
<data type="coordinates">12,34,50,14</data>
</component>
</rsrc>
</dialogue></egXML>
</exemplum>
<exemplum>
<head>Document with partially localizable content</head>
<p>In the example below, there are no clear mechanism allowing one
to know which <gi>string</gi> element needs to be translated.</p>
<egXML xmlns="http://www.tei-c.org/ns/Examples"><resources>
<section id="Homepage">
<arguments>
<string>page</string>
<string>childlist</string>
</arguments>
<variables>
<string>POLICY</string>
<string><hi rend="localizable">Corporate Policy</hi></string>
</variables>
<keyvalue_pairs>
<string>Page</string>
<string><hi rend="localizable">ABC Corporation - Policy
Repository</hi></string>
<string>Footer_Last</string>
<string><hi rend="localizable">Pages</hi></string>
<string>bgColor</string>
<string>NavajoWhite</string>
<string>title</string>
<string><hi rend="localizable">List of Available
Policies</hi></string>
</keyvalue_pairs>
</section>
</resources></egXML>
</exemplum>
</div>
<div>
<head>Out of Scope</head>
<p>This standard does not exhaustively cover all mechanisms and data
formats which might be needed for configuring localization workflows
or tools to process a specific format. These mechanisms and data formats,
sometimes called <term>Localization Properties</term>, however, possibly
may be implemented by the framework put forth in this standard (see
in particular <ptr target="#selection" type="specref"/>)
documents.<note>$(Bc`W9(BML localization properties" is a generic term
to name the mechanisms
and data formats that allows localization tools to be configured in
order to process a specific XML format. Examples of "XML localization
properties" are: the "Trados DTD Settings" file, the SDLX "Analysis"
file.</note></p>
</div>
<div xml:id="design-decisions">
<head>Important Design Principles</head>
<note> <p>Attention: The design
of the ITS schema in <ptr target="#selection" type="specref"/> and <ptr
target="#datacat-description" type="specref"/> is still under development.
Nevertheless, the working group does not intend to change large parts
of the element and attribute names and their functionality, but only
the ITS schema structure.</p></note>
<p>Abstraction via <emph>data categories</emph>: ITS defines data
categories as an abstract notion for information for internationalization
and localization of XML schemas and documents. This abstraction is
helpful in realizing independence from a particular implementation
e.g. using an element or attribute. See <ptr
target="#def-datacat" type="specref"/> for a definition of the term
data categories, <ptr target="#datacat-description" type="specref"/> for
the definition of the various ITS data categories, and <hi>subsections
in <ptr target="#datacat-description" type="specref"/></hi> for the
data category
implementations.</p>
<p> Powerful <emph>selection mechanism:</emph>For any ITS markup
which appears in an XML instance or XML schema,
it has to be clearly defined to which XML nodes the ITS-related information
pertains. Thus, ITS specifies selection as a mechanism to specify
to what parts of an XML document or schema an ITS data category and
its values should be applied.</p>
<p>Content authors need for example a simple way to work with the<ref
target="#translate">translatability data category</ref> in order to
express whether the content of an element or attribute should be
translated or not. On the other hand, for translations of large document
sets based on the same schema, a specification of defaults for
translatability
and exceptions from the defaults is important (e.g. all <gi>p</gi>
elements should be translated, but not <gi>p</gi> elements inside
of an <gi>index</gi> element). </p>
<p>This specification responds to these requirements by introducing
mechanisms for specifying ITS information in XML documents,
see <ptr target="#selection" type="specref"/>. These mechanisms also
provide a means for specifying ITS information
for attributes (a task for which no standard means yet exists). The
ITS mechanisms for selection are:</p>
<list type="unordered">
<item>as for XML <hi>documentss</hi>, useable <ref
target="#selection-local">local</ref> (at the XML node to which it
pertains) or <ref target="#selection-global">globally</ref> (not at
the XML node to which it pertains)</item>
<item>as for global usage: possibly in the target XML <hi>document</hi>
or in a separate file</item>
</list>
<p> <emph>No dedicated extensibility</emph>: It may be useful or
necessary to extend the set of information
available for internationalization or localization purposes beyond
what is provided by ITS. This specification does not define a dedicated
extension mechanism, since ordinary XML mechanisms (e.g. XML Namespaces
<ptr target="#xmlns" type="bibref"/>) may be used.</p>
<p> <emph>Ease of integration</emph>:</p>
<list type="unordered">
<item> ITS follows the example from <ref
target="http://www.w3.org/TR/xlink11/#att-method">section 4</ref> of
<ptr target="#xlink11" type="bibref"/>, by providing mostly global
attributes
for the implementation of ITS data categories. Avoiding elements for
ITS purposes as much as possible ensures ease of integration into
existing markup schemes, see <ref
target="http://www.w3.org/TR/itsreq/#impact">section 3.14</ref> in <ptr
target="#itsreq" type="bibref"/>. Only for some requirements additional
child elements have to be used, see for example <ptr target="#ruby-sec"
type="specref"/>.</item>
<item>ITS has no dependency on technologies which are yet to be
developed</item>
<item>ITS fits with existing work in the W3C architecture (e.g. use
of XPath <ptr target="#xpath10" type="bibref"/>for the selection
mechanism)</item>
</list>
</div>
<div xml:id="tei-ref">
<head>Development of this Specification</head>
<p>This specification has been developed using the ODD (<emph>One
Document Does it all</emph>) language of the Text Encoding Initiative
(<ptr target="#tei" type="bibref"/>). This is a literate programming
language for writing XML schemas, with three characteristics: <list
type="ordered">
<item>The element and attribute set is specified using an XML vocabulary
which includes support for macros (like DTD entities, or schema patterns),
a hierarchical class system for attributes and elements, and creation
of modules.</item>
<item>The content models for elements and attributes is written using
embedded RELAX NG XML notation.</item>
<item>Documentation for elements, attributes, value lists etc is written
inline, along with examples and other supporting material.</item>
</list> XSLT transform are provided by the TEI to extract documentation
in HTML, XSL FO or LaTeX forms, and to generate RELAX NG documents
and DTD. From the RELAX NG documents, James Clark's <ref
target="http://www.thaiopensource.com/relaxng/trang.html">trang</ref>
<?Pub Caret?>can be used to create XML Schema documents.</p>
</div>
</div><?Pub *0000019187 0?>
Received on Thursday, 6 April 2006 23:52:14 UTC