- From: Felix Sasaki <fsasaki@w3.org>
- Date: Fri, 07 Apr 2006 08:51:49 +0900
- To: public-i18n-its@w3.org
- Message-ID: <4435A995.5060207@w3.org>
Hi all, Due to technical problems I could not integrate Christian's rewrite of the "introduction" section into the draft. Since time is running, please have a look at the source text below and comment as soon as possible. We will discuss this at today's editors call. - Felix --><div xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:id="introduction" xsi:schemaLocation="http://www.tei-c.org/ns/1.0 C:\DOCUME~1\d025418\MYDOCU~1\MYPROJ~1\ITS\its-edit\odd.xsd"> <head>Introduction</head> <p> <emph>This section is informative.</emph></p> <p>This document defines a standard for high-quality, cost efficient internationalization and localization of schemas and XML instances (both existing ones and new ones). On the one hand, the standard is defined conceptually through the notion of data categories. On the other hand, the standard defines implementations of these data categories as a set of elements and attributes called the Internationalization Tag Set (ITS). The document provides examples of how ITS can be used with existing popular markup schemes such as DocBook. Furthermore, the document provides implementations for three schema languages: XML DTD <ptr target="#xml10spec" type="bibref"/>, XML Schema <ptr target="#xmlschema1" type="bibref"/> and RELAX NG <ptr target="#relaxng" type="bibref"/>. Feedback related to this document is especially appreciated on the general concept of ITS and the mechanisms defined for the selection of ITS-specific information in documents and schemas, and on the design of the individual data categories.</p> <p>Requirements for this document are formulated in <ptr target="#itsreq" type="bibref"/>. Not all of these requirements are addressed in this document, for example:</p> <list type="unordered"> <item> <ref target="http://www.w3.org/TR/2005/WD-itsreq-20050805/#constraints">R001 - Indicator of Constraints</ref> </item> <item> <ref target="http://www.w3.org/TR/2005/WD-itsreq-20050805/#entities">R005 - Handling Entities</ref> </item> <item> <ref target="http://esw.w3.org/topic/its0908LinguisticMarkup">R023 - Linguistic Markup</ref> </item> </list> <p>The Working Group will cover some of the requirements in a separate document on techniques for internationalization and localization of schemas and XML <hi>documents</hi>.</p> <div> <head>Users and Usages of ITS</head> <div> <head>Potential Users of ITS</head> <p>The ITS specification aims to provide different types of users with information about what markup should be supported to enable worldwide use and effective localization of content. Important types of users are<list> <item>schema developers who start a schema from ground up: <p>This type of user will find proposals for attribute and element names to be included in their new schema. Using the same names as proposed here may be helpful because it leads to easier recognition of the concepts represented by both schema users and processors. It is perfectly possible, however, for the schema developer to develop their own set of tag and element names. The specification sets out, first and foremost, to ensure the required markup is available, and that the behaviour of that markup meets established needs.</p></item> <item>schema developers who work with an existing schema:<p>This type of user will be working with schemassuch as DocBook, DITA, or perhaps an in-house schema.</p><p>The ITS Working Group has sought input from people developing widely used formats such as the ones mentioned, and the specification provides examples of how those specific formats could be adapted to support ITS.</p><p>Developers working on existing schemas should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema. </p><p>In some cases, the schema may already contain markup equivalent to that recommended in ITS. In this case it is not necessary to add duplicate markup since ITS provides mechanisms for relating ITS markup with markup in the host vocabulary which serves a similar purpose (see <ptr target="#purpose-mapping" type="specref"/>). The developer should, however, check that the behaviour associated with the markup in their own schema is fully compatible with the expectations described in this specification.</p></item> <item>vendors of content-related tools:<p>This type of users encompasses companies which provide tools for authoring, translation or other flavours of content-related software solutions. It is important to ensure that such tools enable worldwide use and effective localization of content. For example, translation tools should prevent content marked up as not for translation from being changed. It is hoped that the ITS specification will make the job of vendors easier by standardising the format and processing expectations of certain relevant markup items, and allowing them to more effectively identify how content should be handled.</p></item> <item>knowledge workers:<p>This type of users comprises authors, translators and other types of content authors. The markup proposed in this specification may be used by them to mark up specific bits of content. However, the burden of inserting markup can sometimes be removed from content authors by relating the ITS information to relevant bits of content in a more global manner. This work may fall to information architects, rather than the content authors themselves.</p></item> </list>In order to support all of these users, the information about what markup should be supported to enable worldwide use and effective localization of content is provided in two ways:</p> <list> <item>abstract in the data category descriptions<ptr target="#datacat-description" type="specref"/></item> <item>concrete in the ITS schemas<ptr target="#its-schemas" type="specref"/></item> </list> <div> <head>Ways to Use ITS</head> <p>The ITS specification proposes several mechanisms for supporting worldwide use and effective localization of content. We will explore them below. For the purpose of illustration, we will answer the question, how ITS can indicate that certain parts of content should or should not be translated.</p> <p>A content author uses an attribute on a particular element in the content to say that the text should not be translated</p> <exemplum> <head>Use of ITS by content author</head> <egXML xmlns="http://www.tei-c.org/ns/Examples"><book> <head>...</head> <body> ... <p>And he said: you need a new <quote its:translate="yes">T-Model</quote></p> ... </body> </book></egXML> </exemplum> <p>A content author or information architect uses markup at the top of the document to identify a particular type of element or context in which the content should not be translated.</p> <exemplum> <head>Use of ITS by information architect</head> <egXML xmlns="http://www.tei-c.org/ns/Examples"><text> <head> <its:rules xmlns:its="http://www.w3.org/2005/11/its"> <its:translateRule its:translate="yes" its:translateSelector="//p"/> <its:rules> </head> <body> ... <p> ... <dl><dt>...</dt><dd>...</dd></dl></p> </body> </text></egXML> </exemplum> <p>A processor may inject markup at the top of the document which links to ITS information outside of the document.</p> <exemplum> <head>Use of ITS by automated process</head> <egXML xmlns="http://www.tei-c.org/ns/Examples"><text> <head> <its:rules xmlns:its="http://www.w3.org/2005/11/its"> <its:rulesLink xlink:href="someUri"/> <its:rules> </head> <body> ... <p> ... <dl><dt>...</dt><dd>...</dd></dl></p> </body> </text></egXML> </exemplum> <p>A schema developer uses constructs in the schema itself to indicate that specific parts of the content should not be translated.</p> <exemplum> <head>Use of ITS by schema developer</head> <egXML xmlns="http://www.tei-c.org/ns/Examples">TODO</egXML> </exemplum> <p>The first two approaches above can be likened to the use of CSS in XHTML. Using a style attribute, an XHTML content author may assign a colour to a particular paragraph. That author could also have used the style element at the top of the page to say that all paragraphs of a particular class or in a particular context would be coloured red.</p> </div> </div> </div> <div> <head>Motivation for ITS</head> <p>Content or software that is authored in one language (so-called source language) is often made available in additional languages or adapted with regard to other cultural aspects. This is done through a process called localization, where the original material is translated and adapted to the target audience.</p> <div> <p>In addition, document formats expressed by schemas may be used by people in different parts of the world, and these people may need special markup to support the local language or script. For example, people authoring in languages such as Arabic, Hebrew, Persian or Urdu need special markup to demarcate directionality in mixed direction text.</p> </div> <p>From the viewpoints of feasibility, cost, and efficiency, it is important that the original material should be suitable for localization. This is achieved by appropriate design and development, and the corresponding process is referred to as internationalization. For a detailed explanation of the terms "localization" and "internationalization", see <ptr target="#geo-i18n-l10n" type="bibref"/>.</p> <p>The increasing usage of XML as a medium for documentation-related content (e.g. DocBook, a format for writing structured documentation, well suited to computer hardware and software manuals) and software-related content (e.g. the eXtensible User Interface Language <ptr target="#xul" type="bibref"/>) creates challenges and opportunities in the domain of XML internationalization and localization.</p> <p><?Pub Dtl?>The following examples sketch one of the issues that currently hinder efficient XML-related localization: the lack of a standard, declarative mechanism which identifies which parts of an XML <hi>document</hi> need to be translated (the <hi rend="localizable">text in bold face</hi> shows the parts that need to be localized). Tools often cannot automatically do this identification.</p> <exemplum><?Pub Dtl?> <head>Document with partially localizable content</head> <p> <gi>PhaseCode</gi> should not be translated; the <att>title</att> attribute sometimes has to be translated and sometimes must not be translated.</p> <egXML xmlns="http://www.tei-c.org/ns/Examples"><Manual> <Info> <PhaseCode>Review Level</PhaseCode> <FormNo>8U81-GS-52C</FormNo> <Name><hi rend="localizable">Owner's Manual</hi></Name> ... </Info> <Section id="0" title="#Introduction#"> <Ltitle id="005" title="#ZOOM#"> <Mtitle id="00501" title="<hi rend="localizable">Getting started</hi>" option="no" cols="1"> <MultiCol cols="1"> <Text><hi rend="localizable">Some text to localize</hi></Text> ... </Multicol> </Mtitle> </Ltitle>... </Manual></egXML> </exemplum> <exemplum><?Pub Dtl?> <head>Document with partially localizable content</head> <p>The first file name in the first <gi>component</gi> element would not be translated.</p> <egXML xmlns="http://www.tei-c.org/ns/Examples"><dialogue xml:lang="en-gb"> <rsrc id="123"> <component id="456" type="image"> <data type="text">images/cancel.gif</data> <data type="coordinates">12,20,50,14</data> </component> <component id="789" type="caption"> <data type="text"><hi rend="localizable">Cancel</hi></data> <data type="coordinates">12,34,50,14</data> </component> </rsrc> </dialogue></egXML> </exemplum> <exemplum> <head>Document with partially localizable content</head> <p>In the example below, there are no clear mechanism allowing one to know which <gi>string</gi> element needs to be translated.</p> <egXML xmlns="http://www.tei-c.org/ns/Examples"><resources> <section id="Homepage"> <arguments> <string>page</string> <string>childlist</string> </arguments> <variables> <string>POLICY</string> <string><hi rend="localizable">Corporate Policy</hi></string> </variables> <keyvalue_pairs> <string>Page</string> <string><hi rend="localizable">ABC Corporation - Policy Repository</hi></string> <string>Footer_Last</string> <string><hi rend="localizable">Pages</hi></string> <string>bgColor</string> <string>NavajoWhite</string> <string>title</string> <string><hi rend="localizable">List of Available Policies</hi></string> </keyvalue_pairs> </section> </resources></egXML> </exemplum> </div> <div> <head>Out of Scope</head> <p>This standard does not exhaustively cover all mechanisms and data formats which might be needed for configuring localization workflows or tools to process a specific format. These mechanisms and data formats, sometimes called <term>Localization Properties</term>, however, possibly may be implemented by the framework put forth in this standard (see in particular <ptr target="#selection" type="specref"/>) documents.<note>$(Bc`W9(BML localization properties" is a generic term to name the mechanisms and data formats that allows localization tools to be configured in order to process a specific XML format. Examples of "XML localization properties" are: the "Trados DTD Settings" file, the SDLX "Analysis" file.</note></p> </div> <div xml:id="design-decisions"> <head>Important Design Principles</head> <note> <p>Attention: The design of the ITS schema in <ptr target="#selection" type="specref"/> and <ptr target="#datacat-description" type="specref"/> is still under development. Nevertheless, the working group does not intend to change large parts of the element and attribute names and their functionality, but only the ITS schema structure.</p></note> <p>Abstraction via <emph>data categories</emph>: ITS defines data categories as an abstract notion for information for internationalization and localization of XML schemas and documents. This abstraction is helpful in realizing independence from a particular implementation e.g. using an element or attribute. See <ptr target="#def-datacat" type="specref"/> for a definition of the term data categories, <ptr target="#datacat-description" type="specref"/> for the definition of the various ITS data categories, and <hi>subsections in <ptr target="#datacat-description" type="specref"/></hi> for the data category implementations.</p> <p> Powerful <emph>selection mechanism:</emph>For any ITS markup which appears in an XML instance or XML schema, it has to be clearly defined to which XML nodes the ITS-related information pertains. Thus, ITS specifies selection as a mechanism to specify to what parts of an XML document or schema an ITS data category and its values should be applied.</p> <p>Content authors need for example a simple way to work with the<ref target="#translate">translatability data category</ref> in order to express whether the content of an element or attribute should be translated or not. On the other hand, for translations of large document sets based on the same schema, a specification of defaults for translatability and exceptions from the defaults is important (e.g. all <gi>p</gi> elements should be translated, but not <gi>p</gi> elements inside of an <gi>index</gi> element). </p> <p>This specification responds to these requirements by introducing mechanisms for specifying ITS information in XML documents, see <ptr target="#selection" type="specref"/>. These mechanisms also provide a means for specifying ITS information for attributes (a task for which no standard means yet exists). The ITS mechanisms for selection are:</p> <list type="unordered"> <item>as for XML <hi>documentss</hi>, useable <ref target="#selection-local">local</ref> (at the XML node to which it pertains) or <ref target="#selection-global">globally</ref> (not at the XML node to which it pertains)</item> <item>as for global usage: possibly in the target XML <hi>document</hi> or in a separate file</item> </list> <p> <emph>No dedicated extensibility</emph>: It may be useful or necessary to extend the set of information available for internationalization or localization purposes beyond what is provided by ITS. This specification does not define a dedicated extension mechanism, since ordinary XML mechanisms (e.g. XML Namespaces <ptr target="#xmlns" type="bibref"/>) may be used.</p> <p> <emph>Ease of integration</emph>:</p> <list type="unordered"> <item> ITS follows the example from <ref target="http://www.w3.org/TR/xlink11/#att-method">section 4</ref> of <ptr target="#xlink11" type="bibref"/>, by providing mostly global attributes for the implementation of ITS data categories. Avoiding elements for ITS purposes as much as possible ensures ease of integration into existing markup schemes, see <ref target="http://www.w3.org/TR/itsreq/#impact">section 3.14</ref> in <ptr target="#itsreq" type="bibref"/>. Only for some requirements additional child elements have to be used, see for example <ptr target="#ruby-sec" type="specref"/>.</item> <item>ITS has no dependency on technologies which are yet to be developed</item> <item>ITS fits with existing work in the W3C architecture (e.g. use of XPath <ptr target="#xpath10" type="bibref"/>for the selection mechanism)</item> </list> </div> <div xml:id="tei-ref"> <head>Development of this Specification</head> <p>This specification has been developed using the ODD (<emph>One Document Does it all</emph>) language of the Text Encoding Initiative (<ptr target="#tei" type="bibref"/>). This is a literate programming language for writing XML schemas, with three characteristics: <list type="ordered"> <item>The element and attribute set is specified using an XML vocabulary which includes support for macros (like DTD entities, or schema patterns), a hierarchical class system for attributes and elements, and creation of modules.</item> <item>The content models for elements and attributes is written using embedded RELAX NG XML notation.</item> <item>Documentation for elements, attributes, value lists etc is written inline, along with examples and other supporting material.</item> </list> XSLT transform are provided by the TEI to extract documentation in HTML, XSL FO or LaTeX forms, and to generate RELAX NG documents and DTD. From the RELAX NG documents, James Clark's <ref target="http://www.thaiopensource.com/relaxng/trang.html">trang</ref> <?Pub Caret?>can be used to create XML Schema documents.</p> </div> </div><?Pub *0000019187 0?>
Received on Thursday, 6 April 2006 23:52:14 UTC