- From: CVS User fsasaki <cvsmail@w3.org>
- Date: Wed, 29 May 2013 16:14:02 +0000
- To: public-multilingualweb-lt-commits@w3.org
Update of /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20
In directory gil:/tmp/cvs-serv20361
Added Files:
its20-for-editing-sec1-sec2.odd
Log Message:
odd file for sec 1-2 spec editing
--- /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20-for-editing-sec1-sec2.odd 2013/05/29 16:14:02 NONE
+++ /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20-for-editing-sec1-sec2.odd 2013/05/29 16:14:02 1.1
<?xml version="1.0" encoding="UTF-8"?><?oxygen RNGSchema="tools/tei-w3c.rnc" type="compact"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:its="http://www.w3.org/2005/11/its"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:rng="http://relaxng.org/ns/structure/1.0"
xmlns:spec="http://example.com/xmlspec" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude" xml:lang="en">
<header xmlns="http://example.com/xmlspec">
<title>Internationalization Tag Set (ITS) Version 2.0</title>
<w3c-designation>ITS20</w3c-designation>
<w3c-doctype>W3C Last Call Working Draft</w3c-doctype>
<pubdate>
<day>21</day>
<month>May</month>
<year>2013</year>
</pubdate>
<publoc>
<loc href="http://www.w3.org/TR/2013/WD-its20-20130521/">
http://www.w3.org/TR/2013/WD-its20-20130521/</loc>
</publoc>
<altlocs>
<loc href="its20.odd">ODD/XML document</loc>
<loc href="itstagset20.zip">self-contained zipped archive</loc>
<loc href="diffs/diff-wd20130521-wd20130411.html">XHTML Diff markup to previous publication
2013-04-11</loc>
</altlocs>
<prevlocs>
<loc href="http://www.w3.org/TR/2013/WD-its20-20130411/">
http://www.w3.org/TR/2013/WD-its20-20130411/</loc>
</prevlocs>
<latestloc>
<loc href="http://www.w3.org/TR/its20/">http://www.w3.org/TR/its20/</loc>
</latestloc>
<authlist>
<author>
<name>Shaun McCane</name>
<affiliation>Invited Expert</affiliation>
</author>
<author>
<name>Dave Lewis</name>
<affiliation>TCD</affiliation>
</author>
<author>
<name>Christian Lieske</name>
<affiliation>SAP AG</affiliation>
</author>
<author>
<name>Arle Lommel</name>
<affiliation>DFKI</affiliation>
</author>
<author>
<name>Jirka Kosek</name>
<affiliation>UEP</affiliation>
</author>
<author>
<name>Felix Sasaki</name>
<affiliation>DFKI / W3C Fellow</affiliation>
</author>
<author>
<name>Yves Savourel</name>
<affiliation>ENLASO</affiliation>
</author>
</authlist>
<!-- <errataloc role="spec-conditional" href="http://www.w3.org/International/its/itstagset/its-errata.html"/>
<translationloc role="spec-conditional" href="http://www.w3.org/2003/03/Translations/byTechnology?technology=its"/>
-->
<abstract>
<p>The technology described in this document - the <emph>Internationalization Tag Set (ITS)
2.0</emph> - enhances the foundation to integrate automated processing of human language
into core Web technologies. ITS 2.0 bears many commonalities with is predecessor, <loc
href="http://www.w3.org/TR/2007/REC-its-20070403/">ITS 1.0</loc> but provides additional
concepts that are designed to foster the automated creation and processing of multilingual
Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage
processing based on the XML Localization Interchange File Format (XLIFF), as well as the
Natural Language Processing Interchange Format (NIF).</p>
</abstract>
<status>
<p>
<emph>This section describes the status of this document at the time of its publication.
Other documents may supersede this document. A list of current W3C publications and the
latest revision of this technical report can be found in the <loc
href="http://www.w3.org/TR/">W3C technical reports index</loc> at
http://www.w3.org/TR/.</emph>
</p>
<p>The technology described in this document - the <emph>Internationalization Tag Set (ITS)
2.0</emph> - enhances the foundation to integrate automated processing of human language
into core Web technologies. ITS 2.0 bears many commonalities with is predecessor, <loc
href="http://www.w3.org/TR/2007/REC-its-20070403/">ITS 1.0</loc> but provides additional
concepts that are designed to foster the automated creation and processing of multilingual
Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage
processing based on the XML Localization Interchange File Format (XLIFF), as well as the
Natural Language Processing Interchange Format (NIF).</p>
<p>This document was published by the <loc
href="http://www.w3.org/International/multilingualweb/lt/">MultilingualWeb-LT Working
Group</loc> as a Last Call Working Draft. The Last Call period ends 11 June 2013. The publication reflects changes made since the previous
<loc href="http://www.w3.org/TR/2012/WD-its20-20121206/">Last Call publication 6 December 2012</loc> and the <loc href="http://www.w3.org/TR/2013/WD-its20-20130411/">ordinary working draft 11 April 2013</loc>. The Working Group expects to advance this
document to Recommendation status (see <loc
href="http://www.w3.org/2004/02/Process-20040205/tr.html#maturity-levels">W3C document
maturity levels</loc>).</p>
<p>All <loc href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/disposition-of-comments-1st-last-call.html">last call issues</loc> in the normative sections (from <specref ref="notation-terminology"/> to
<specref ref="datacategory-description"/> and <specref ref="normative-references"/> to
<specref ref="its-schemas"/>) have been resolved. The other, non-normative sections contain only
explanatory material and will be updated in a later working draft. The Working Group
encourages feedback until 11 June 2013.</p>
<p>Substantive changes during the first last call period are: <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/67">a new regular expression definition for allowed characters</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/68">re-formulation of disambiguation data category to "text analysis"</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/90">making directionality normative again</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/91">removal of the ruby section</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/97">aligning ITS 2.0 translate in HTML5 with the HTML5 definition of the attribute</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/118">defining default behaviour for Elements within Text in HTML5</loc>.</p>
<p>Since the <loc href="http://htmlpreview.github.io/?https://raw.github.com/finnle/ITS-2.0-Testsuite/master/its2.0/testSuiteDashboard.html">ITS 2.0 test suite</loc> already has a high coverage for normative features of this specification, the Working Group expects to advance the specification directly to Proposed Recommendation status.</p>
<p>To give feedback send your comments to <loc
href="mailto:public-multilingualweb-lt-comments@w3.org"
>public-multilingualweb-lt-comments@w3.org</loc>. Use "Comment on ITS 2.0 specification
WD" in the subject line of your email. The <loc
href="http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/">archives
for this list</loc> are publicly available. See also <loc
href="https://www.w3.org/International/multilingualweb/lt/track/issues/">issues discussed
within the Working Group</loc> and the <loc href="#changelog-since-20130411">list of
changes since the previous publication</loc>.</p>
<p>Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p>
<p>This document was produced by a group operating under the <loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</loc>. W3C maintains a <loc href="http://www.w3.org/2004/01/pp-impl/53116/status">public list of any patent disclosures</loc> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p>
</status>
<langusage>
<language id="en">en</language>
</langusage>
<revisiondesc>
<p>This is the first version of this document.</p>
</revisiondesc>
</header>
<text>
<body>
<div xml:id="introduction">
<head>Introduction</head>
<p>
<emph>This section is informative.</emph>
</p>
<p>ITS 2.0 is a technology to add metadata to Web content, for the benefit of localization,
language technologies, and internationalization. The ITS 2.0 specification both identifies
concepts (such as <q>Translate</q>) that are important for internationalization and
localization, and defines implementations of these concepts (termed “ITS data categories”)
as a set of elements and attributes called the <emph>Internationalization Tag Set
(ITS)</emph>. The document provides implementations for HTML, serializations in <ref target="http://persistence.uni-leipzig.org/nlp2rdf/">NIF</ref> (NLP Interchange Format) <ptr target="#nif-reference" type="bibref"/>, and provides
definitions of ITS elements and attributes in the form of XML Schema <ptr
target="#xmlschema1" type="bibref"/> and RELAX NG <ptr target="#relaxng" type="bibref"
/>.</p>
<p>This document aims to realize many of the ideas formulated in the <ref
target="http://www.w3.org/TR/2012/WD-its2req-20120524/">ITS 2.0 Requirements
document</ref>, in <ptr target="#itsreq" type="bibref"/> and <ptr target="#reqlocdtd"
type="bibref"/>.</p>
<p>Not all requirements listed there are addressed in this document. Those which are not
addressed here are either covered in <ptr type="bibref" target="#xml-i18n-bp"/>
(potentially in an as yet unwritten best practice document on multilingual Web content),
or may be addressed in a future version of this specification.</p>
<div xml:id="relation-to-its10-and-new-principles">
<head>Relation to ITS 1.0 and New Principles</head>
<div xml:id="relation-to-its10">
<head>Relation to ITS 1.0</head>
<p>ITS 2.0 has the following relations to ITS 1.0 <ptr target="#its10" type="bibref"/>:</p>
<list type="unorderd">
<item><p>It adopts and maintains the following principles from ITS 1.0: </p><list
type="unorderd">
<item>It adopts the use of data categories to define discrete units of
functionality</item>
<item>It adopts the separation of data category definition from the mapping of the
data category to a given content format</item>
<item>It adopts the conformance principle of ITS1.0 that an implementation only
needs to implement one data category to claim conformance to ITS 2.0</item>
</list>
</item>
<item>ITS 2.0 supports all ITS 1.0 data category definitions and adds new definitions,
with the exceptions of <ref target="#directionality">Directionality</ref> and Ruby.</item>
<item>ITS 2.0 adds a number of new data categories not found in ITS 1.0.</item>
<item>While ITS 1.0 addressed only XML, ITS 2.0 specifies implementations of data
categories in <emph>both</emph> XML <emph>and</emph> HTML.</item>
</list>
</div>
<div xml:id="ruby-in-its2">
<head>Ruby and ITS 2.0</head>
<p>ITS 1.0 provided the <ref target="http://www.w3.org/TR/2007/REC-its-20070403/#ruby-annotation">Ruby data category</ref>. ITS 2.0 does not provide ruby since at the time of writing, a stable model for ruby was not available. There are ongoing discussions about the <ref target="http://www.w3.org/TR/html51/text-level-semantics.html#the-ruby-element">ruby model in HTML5</ref>. Once these discussions are settled, in a subsequent version of ITS, the ruby data category may be re-introduced.</p>
</div>
<div xml:id="new-principles">
<head>New Principles</head>
<p>ITS 2.0 also adds the following principles and features not found in ITS 1.0:</p>
<list type="unorderd">
<item>ITS 2.0 data categories are intended to be format neutral, with support for XML,
HTML, and NIF: a data category implementation only needs to support a single content
format mapping in order to support a claim of ITS 2.0 conformance.</item>
<item>ITS 2.0 provides algorithms to generate NIF out of HTML or XML with ITS 2.0
metadata.</item>
<item>A global implementation of ITS 2.0 requires at least the <ref target="#xpath"
>XPath version 1.0</ref>. Other versions of XPath or other query languages (e.g.,
CSS Selectors) can be expressed via a dedicated <ref target="#queryLanguage"
>queryLanguage</ref> attribute.</item>
</list>
<p xml:id="its20-new-data-categories">The new data categories included in ITS 2.0
are:</p>
<list type="unorderd">
<item><ref target="#domain">Domain</ref></item>
<item><ref target="#textanalysis">Text Analysis</ref></item>
<item><ref target="#LocaleFilter">Locale Filter</ref></item>
<item><ref target="#provenance">Provenance</ref></item>
<item><ref target="#externalresource">External Resource</ref></item>
<item><ref target="#target-pointer">Target Pointer</ref></item>
<item><ref target="#idvalue">Id Value</ref></item>
<item><ref target="#preservespace">Preserve Space</ref></item>
<item><ref target="#lqissue">Localization Quality Issue</ref></item>
<item><ref target="#lqrating">Localization Quality Rating</ref></item>
<item><ref target="#mtconfidence">MT Confidence</ref></item>
<item><ref target="#allowedchars">Allowed Characters</ref></item>
<item><ref target="#storagesize">Storage Size</ref></item>
</list>
</div>
</div>
<div xml:id="motivation-its">
<head>Motivation for ITS</head>
<p>Content or software that is authored in one language (the <term>source language</term>)
is often made available in additional languages or adapted with regard to other cultural
aspects. This is done through a process called <term>localization</term>, where the
original material is translated and adapted to the target audience.</p>
<p>In addition, document formats expressed by schemas may be used by people in different
parts of the world, and these people may need special markup to support the local
language or script. For example, people authoring in languages such as Arabic, Hebrew,
Persian, or Urdu need special markup to specify directionality in mixed direction
text.</p>
<p>From the viewpoints of feasibility, cost, and efficiency, it is important that the
original material should be suitable for localization. This is achieved by appropriate
design and development, and the corresponding process is referred to as
internationalization. For a detailed explanation of the terms “localization” and
“internationalization”, see <ptr target="#geo-i18n-l10n" type="bibref"/>.</p>
<note type="ed">Note: This should refer to the best practice document as well, when
ready.</note>
<p>The increasing usage of XML as a medium for documentation-related content (e.g. <ref
target="https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook#technical"
>DocBook</ref>> and <ref
target="https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita#technical"
>DITA</ref> as formats for writing structured documentation, well suited to computer
hardware and software manuals) and software-related content (e.g. the eXtensible User
Interface Language <ptr target="#xul" type="bibref"/>) creates challenges and
opportunities in the domain of XML internationalization and localization.</p>
<div xml:id="motivation-its-issues">
<head>Typical Problems</head>
<p>The following examples sketch one of the issues that currently hinder efficient
XML-related localization: the lack of a standard, declarative mechanism that
identifies which parts of an XML document need to be translated. Tools often cannot
automatically perform this identification.</p>
<exemplum xml:id="EX-motivation-its-1">
<head>Document with partially translatable content</head>
<p>In this document it is difficult to distinguish between those <code>string</code>
elements that are translatable and those that are not. Only the addition of an
explicit flag could resolve the issue.</p>
<egXML xmlns="http://www.tei-c.org/ns/Examples"
target="examples/xml/EX-motivation-its-1.xml"/>
</exemplum>
<exemplum xml:id="EX-motivation-its-2">
<head>Document with partially translatable content</head>
<p>Even when metadata are available to identify non-translatable text, the conditions
may be quite complex and not directly indicated with a simple flag. Here, for
instance, only the text in the nodes matching the expression
<code>//component[@type!='image']/data[@type='text']</code> is translatable.</p>
<egXML xmlns="http://www.tei-c.org/ns/Examples"
target="examples/xml/EX-motivation-its-2.xml"/>
</exemplum>
</div>
</div>
<div xml:id="users-usage">
<head>Users and Usages of ITS</head>
<div xml:id="potential-users">
<head>Potential Users of ITS</head>
<p>The ITS specification aims to provide different types of users with information about
what markup should be supported to enable worldwide use and effective
internationalization and localization of content. The following paragraphs sketch
these different types of users, and their usage of ITS. In order to support all of
these users, the information about what markup should be supported to enable worldwide
use and effective localization of content is provided in this specification in two
ways:</p>
<list>
<item>abstractly in the data category descriptions: <ptr
target="#datacategory-description" type="specref"/>
</item>
<item>concretely in the ITS schemas: <ptr target="#its-schemas" type="specref"/>
</item>
</list>
<div xml:id="schema-dev-new">
<head>Schema developers starting a schema from the ground up</head>
<p>This type of user will find proposals for attribute and element names to be
included in their new schema (also called "host vocabulary"). Using the attribute
and element names proposed in the ITS specification may be helpful because it leads
to easier recognition of the concepts represented by both schema users and
processors. It is perfectly possible, however, for a schema developer to develop his
own set of attribute and element names. The specification sets out, first and
foremost, to ensure that the required markup is available, and that the behavior of
that markup meets established needs.</p>
</div>
<div xml:id="schema-dev-existing">
<head>Schema developers working with an existing schema</head>
<p>This type of user will be working with schemas such as DocBook, DITA, or perhaps a
proprietary schema. The ITS Working Group has sought input from experts developing
widely used formats such as the ones mentioned.</p>
<note><p>The question "How to use ITS with existing popular markup schemes?" is
covered in more details (including examples) in a separate document: <ptr
target="#xml-i18n-bp" type="bibref"/>.</p></note>
<p>Developers working on existing schemas should check whether their schemas support
the markup proposed in this specification, and, where appropriate, add the markup
proposed here to their schema.</p>
<p>In some cases, an existing schema may already contain markup equivalent to that
recommended in ITS. In this case it is not necessary to add duplicate markup since
ITS provides mechanisms for associating ITS markup with markup in the host
vocabulary which serves a similar purpose (see <ptr
target="#associating-its-with-existing-markup" type="specref"/>). The developer
should, however, check that the behavior associated with the markup in their own
schema is fully compatible with the expectations described in this
specification.</p>
</div>
<div xml:id="content-tool-vendor">
<head>Vendors of content-related tools</head>
<p>This type of user includes companies which provide tools for authoring, translation
or other flavors of content-related software solutions. It is important to ensure
that such tools enable worldwide use and effective localization of content. For
example, translation tools should prevent content marked up as not for translation
from being changed or translated. It is hoped that the ITS specification will make
the job of vendors easier by standardizing the format and processing expectations of
certain relevant markup items, and allowing them to more effectively identify how
content should be handled.</p>
</div>
<div xml:id="content-producers">
<head>Content producers</head>
<p>This type of user comprises authors, translators and other types of content author.
The markup proposed in this specification may be used by them to mark up specific
bits of content. Aside: The burden of inserting markup can be removed from content
producers by relating the ITS information to relevant bits of content in a global
manner (see <ref target="#selection-global">global, rule-based approach</ref>). This
global work, however, may fall to information architects, rather than the content
producers themselves.</p>
<p xml:id="cms-plain-text-fields">Content producers often work with content management
systems (CMS). In various CMS, some of the CMS fields only allow to store plain
text. For these fields, the current ITS 2.0 data categories can only be applied
globally and not with local attributes. This issue should be addressed in another
way, apart from the ITS 2.0 standard. One way would be to allow HTML in these fields
if possible, or using an extra field which allows HTML input and save the plain text
of this extra field in the plain text field.</p>
</div>
<div xml:id="users_machine-translation">
<head>Machine Translation Systems</head>
<p>This type of service is intended for a broad user community ranging from developers
and integrators through translation companies and agencies, freelance translators
and post-editors to ordinary translation consumers and other types of MT employment.
Data categories are envisaged for supporting and guiding the different automated
backend processes of this service type, thereby adding substantial value to the
service results as well as possible subsequent services. These processes include
basic tasks, like parsing constraints and markup, and compositional tasks, such as
disambiguation. These tasks consume and generate valuable metadata from and for
third party users, for example, provenance information and quality scoring, and add
relevant information for follow-on tasks, processes and services, such as MT
post-editing, MT training and MT terminological enhancement.</p>
</div>
<div xml:id="users_text_analytics">
<head>Text Analytics</head>
<p>This type of service provides automatically generated metadata for improving
localization, data integration or knowledge management workflows. This class of
users comprises of developers and integrators of services that automate language
technology tasks such as domain classification, named entity recognition and
disambiguation, term extraction, language identification and others. Text analytics
services generate data that contextualizes the raw content with more explicit
information. This can be used to improve the output quality in machine translation
systems, search result relevance in information retrieval systems, as well as
management and integration of unstructured data in knowledge management systems.</p>
</div>
<div xml:id="users_localization_workflow_managers">
<head>Localization Workflow Managers</head>
<p>These types of users are concerned with localization workflows in which content
goes through certain steps: preparation for localization, start of the localization
process by e.g. a conversion into a bitext (aligned parallel text) format like <ptr
target="#xliff" type="bibref"/>, the actual localization by human translators or
machine translation and other adaptations of content, and finally the integration of
the localized content into the original format. That format is often based on XML or
HTML; (Web) content management systems are widely used for content creation, and
their integration with localization workflows is an important task for the workflow
manager. For the integration of content creation and localization, metadata plays a
crucial role. E.g. an ITS data category like <ref target="#trans-datacat"
>translate</ref> can trigger the extraction of localizable text. <quote>Metadata
roundtripping</quote>, that is the availibility of metadata both before and after
the localization process is crucial for many tasks of the localization workflow
manager. An example is metadata based quality control, with checks like <quote>Have
all pieces of content set to <code>translate="no"</code> been left
unchanged?</quote>. Other pieces of metadata are relevant for proper
internationalization during the localization workflow, e.g. the availibility of <ref
target="#directionality">Directionality</ref> markup for adequate visualization of
bidirectional text.</p>
</div>
</div>
<div xml:id="ways-to-use-its">
<head>Ways to Use ITS</head>
<p>The ITS specification proposes several mechanisms for supporting worldwide use and
effective internationalization and localization of content. We will sketch them below
[6063 lines skipped]
Received on Wednesday, 29 May 2013 16:14:09 UTC