- From: CVS User fsasaki <cvsmail@w3.org>
- Date: Wed, 29 May 2013 16:14:02 +0000
- To: public-multilingualweb-lt-commits@w3.org
Update of /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20 In directory gil:/tmp/cvs-serv20361 Added Files: its20-for-editing-sec1-sec2.odd Log Message: odd file for sec 1-2 spec editing --- /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20-for-editing-sec1-sec2.odd 2013/05/29 16:14:02 NONE +++ /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20-for-editing-sec1-sec2.odd 2013/05/29 16:14:02 1.1 <?xml version="1.0" encoding="UTF-8"?><?oxygen RNGSchema="tools/tei-w3c.rnc" type="compact"?> <TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:its="http://www.w3.org/2005/11/its" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:rng="http://relaxng.org/ns/structure/1.0" xmlns:spec="http://example.com/xmlspec" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude" xml:lang="en"> <header xmlns="http://example.com/xmlspec"> <title>Internationalization Tag Set (ITS) Version 2.0</title> <w3c-designation>ITS20</w3c-designation> <w3c-doctype>W3C Last Call Working Draft</w3c-doctype> <pubdate> <day>21</day> <month>May</month> <year>2013</year> </pubdate> <publoc> <loc href="http://www.w3.org/TR/2013/WD-its20-20130521/"> http://www.w3.org/TR/2013/WD-its20-20130521/</loc> </publoc> <altlocs> <loc href="its20.odd">ODD/XML document</loc> <loc href="itstagset20.zip">self-contained zipped archive</loc> <loc href="diffs/diff-wd20130521-wd20130411.html">XHTML Diff markup to previous publication 2013-04-11</loc> </altlocs> <prevlocs> <loc href="http://www.w3.org/TR/2013/WD-its20-20130411/"> http://www.w3.org/TR/2013/WD-its20-20130411/</loc> </prevlocs> <latestloc> <loc href="http://www.w3.org/TR/its20/">http://www.w3.org/TR/its20/</loc> </latestloc> <authlist> <author> <name>Shaun McCane</name> <affiliation>Invited Expert</affiliation> </author> <author> <name>Dave Lewis</name> <affiliation>TCD</affiliation> </author> <author> <name>Christian Lieske</name> <affiliation>SAP AG</affiliation> </author> <author> <name>Arle Lommel</name> <affiliation>DFKI</affiliation> </author> <author> <name>Jirka Kosek</name> <affiliation>UEP</affiliation> </author> <author> <name>Felix Sasaki</name> <affiliation>DFKI / W3C Fellow</affiliation> </author> <author> <name>Yves Savourel</name> <affiliation>ENLASO</affiliation> </author> </authlist> <!-- <errataloc role="spec-conditional" href="http://www.w3.org/International/its/itstagset/its-errata.html"/> <translationloc role="spec-conditional" href="http://www.w3.org/2003/03/Translations/byTechnology?technology=its"/> --> <abstract> <p>The technology described in this document - the <emph>Internationalization Tag Set (ITS) 2.0</emph> - enhances the foundation to integrate automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with is predecessor, <loc href="http://www.w3.org/TR/2007/REC-its-20070403/">ITS 1.0</loc> but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF).</p> </abstract> <status> <p> <emph>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the <loc href="http://www.w3.org/TR/">W3C technical reports index</loc> at http://www.w3.org/TR/.</emph> </p> <p>The technology described in this document - the <emph>Internationalization Tag Set (ITS) 2.0</emph> - enhances the foundation to integrate automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with is predecessor, <loc href="http://www.w3.org/TR/2007/REC-its-20070403/">ITS 1.0</loc> but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF).</p> <p>This document was published by the <loc href="http://www.w3.org/International/multilingualweb/lt/">MultilingualWeb-LT Working Group</loc> as a Last Call Working Draft. The Last Call period ends 11 June 2013. The publication reflects changes made since the previous <loc href="http://www.w3.org/TR/2012/WD-its20-20121206/">Last Call publication 6 December 2012</loc> and the <loc href="http://www.w3.org/TR/2013/WD-its20-20130411/">ordinary working draft 11 April 2013</loc>. The Working Group expects to advance this document to Recommendation status (see <loc href="http://www.w3.org/2004/02/Process-20040205/tr.html#maturity-levels">W3C document maturity levels</loc>).</p> <p>All <loc href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/disposition-of-comments-1st-last-call.html">last call issues</loc> in the normative sections (from <specref ref="notation-terminology"/> to <specref ref="datacategory-description"/> and <specref ref="normative-references"/> to <specref ref="its-schemas"/>) have been resolved. The other, non-normative sections contain only explanatory material and will be updated in a later working draft. The Working Group encourages feedback until 11 June 2013.</p> <p>Substantive changes during the first last call period are: <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/67">a new regular expression definition for allowed characters</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/68">re-formulation of disambiguation data category to "text analysis"</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/90">making directionality normative again</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/91">removal of the ruby section</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/97">aligning ITS 2.0 translate in HTML5 with the HTML5 definition of the attribute</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/118">defining default behaviour for Elements within Text in HTML5</loc>.</p> <p>Since the <loc href="http://htmlpreview.github.io/?https://raw.github.com/finnle/ITS-2.0-Testsuite/master/its2.0/testSuiteDashboard.html">ITS 2.0 test suite</loc> already has a high coverage for normative features of this specification, the Working Group expects to advance the specification directly to Proposed Recommendation status.</p> <p>To give feedback send your comments to <loc href="mailto:public-multilingualweb-lt-comments@w3.org" >public-multilingualweb-lt-comments@w3.org</loc>. Use "Comment on ITS 2.0 specification WD" in the subject line of your email. The <loc href="http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/">archives for this list</loc> are publicly available. See also <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/">issues discussed within the Working Group</loc> and the <loc href="#changelog-since-20130411">list of changes since the previous publication</loc>.</p> <p>Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p> <p>This document was produced by a group operating under the <loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</loc>. W3C maintains a <loc href="http://www.w3.org/2004/01/pp-impl/53116/status">public list of any patent disclosures</loc> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p> </status> <langusage> <language id="en">en</language> </langusage> <revisiondesc> <p>This is the first version of this document.</p> </revisiondesc> </header> <text> <body> <div xml:id="introduction"> <head>Introduction</head> <p> <emph>This section is informative.</emph> </p> <p>ITS 2.0 is a technology to add metadata to Web content, for the benefit of localization, language technologies, and internationalization. The ITS 2.0 specification both identifies concepts (such as <q>Translate</q>) that are important for internationalization and localization, and defines implementations of these concepts (termed “ITS data categories”) as a set of elements and attributes called the <emph>Internationalization Tag Set (ITS)</emph>. The document provides implementations for HTML, serializations in <ref target="http://persistence.uni-leipzig.org/nlp2rdf/">NIF</ref> (NLP Interchange Format) <ptr target="#nif-reference" type="bibref"/>, and provides definitions of ITS elements and attributes in the form of XML Schema <ptr target="#xmlschema1" type="bibref"/> and RELAX NG <ptr target="#relaxng" type="bibref" />.</p> <p>This document aims to realize many of the ideas formulated in the <ref target="http://www.w3.org/TR/2012/WD-its2req-20120524/">ITS 2.0 Requirements document</ref>, in <ptr target="#itsreq" type="bibref"/> and <ptr target="#reqlocdtd" type="bibref"/>.</p> <p>Not all requirements listed there are addressed in this document. Those which are not addressed here are either covered in <ptr type="bibref" target="#xml-i18n-bp"/> (potentially in an as yet unwritten best practice document on multilingual Web content), or may be addressed in a future version of this specification.</p> <div xml:id="relation-to-its10-and-new-principles"> <head>Relation to ITS 1.0 and New Principles</head> <div xml:id="relation-to-its10"> <head>Relation to ITS 1.0</head> <p>ITS 2.0 has the following relations to ITS 1.0 <ptr target="#its10" type="bibref"/>:</p> <list type="unorderd"> <item><p>It adopts and maintains the following principles from ITS 1.0: </p><list type="unorderd"> <item>It adopts the use of data categories to define discrete units of functionality</item> <item>It adopts the separation of data category definition from the mapping of the data category to a given content format</item> <item>It adopts the conformance principle of ITS1.0 that an implementation only needs to implement one data category to claim conformance to ITS 2.0</item> </list> </item> <item>ITS 2.0 supports all ITS 1.0 data category definitions and adds new definitions, with the exceptions of <ref target="#directionality">Directionality</ref> and Ruby.</item> <item>ITS 2.0 adds a number of new data categories not found in ITS 1.0.</item> <item>While ITS 1.0 addressed only XML, ITS 2.0 specifies implementations of data categories in <emph>both</emph> XML <emph>and</emph> HTML.</item> </list> </div> <div xml:id="ruby-in-its2"> <head>Ruby and ITS 2.0</head> <p>ITS 1.0 provided the <ref target="http://www.w3.org/TR/2007/REC-its-20070403/#ruby-annotation">Ruby data category</ref>. ITS 2.0 does not provide ruby since at the time of writing, a stable model for ruby was not available. There are ongoing discussions about the <ref target="http://www.w3.org/TR/html51/text-level-semantics.html#the-ruby-element">ruby model in HTML5</ref>. Once these discussions are settled, in a subsequent version of ITS, the ruby data category may be re-introduced.</p> </div> <div xml:id="new-principles"> <head>New Principles</head> <p>ITS 2.0 also adds the following principles and features not found in ITS 1.0:</p> <list type="unorderd"> <item>ITS 2.0 data categories are intended to be format neutral, with support for XML, HTML, and NIF: a data category implementation only needs to support a single content format mapping in order to support a claim of ITS 2.0 conformance.</item> <item>ITS 2.0 provides algorithms to generate NIF out of HTML or XML with ITS 2.0 metadata.</item> <item>A global implementation of ITS 2.0 requires at least the <ref target="#xpath" >XPath version 1.0</ref>. Other versions of XPath or other query languages (e.g., CSS Selectors) can be expressed via a dedicated <ref target="#queryLanguage" >queryLanguage</ref> attribute.</item> </list> <p xml:id="its20-new-data-categories">The new data categories included in ITS 2.0 are:</p> <list type="unorderd"> <item><ref target="#domain">Domain</ref></item> <item><ref target="#textanalysis">Text Analysis</ref></item> <item><ref target="#LocaleFilter">Locale Filter</ref></item> <item><ref target="#provenance">Provenance</ref></item> <item><ref target="#externalresource">External Resource</ref></item> <item><ref target="#target-pointer">Target Pointer</ref></item> <item><ref target="#idvalue">Id Value</ref></item> <item><ref target="#preservespace">Preserve Space</ref></item> <item><ref target="#lqissue">Localization Quality Issue</ref></item> <item><ref target="#lqrating">Localization Quality Rating</ref></item> <item><ref target="#mtconfidence">MT Confidence</ref></item> <item><ref target="#allowedchars">Allowed Characters</ref></item> <item><ref target="#storagesize">Storage Size</ref></item> </list> </div> </div> <div xml:id="motivation-its"> <head>Motivation for ITS</head> <p>Content or software that is authored in one language (the <term>source language</term>) is often made available in additional languages or adapted with regard to other cultural aspects. This is done through a process called <term>localization</term>, where the original material is translated and adapted to the target audience.</p> <p>In addition, document formats expressed by schemas may be used by people in different parts of the world, and these people may need special markup to support the local language or script. For example, people authoring in languages such as Arabic, Hebrew, Persian, or Urdu need special markup to specify directionality in mixed direction text.</p> <p>From the viewpoints of feasibility, cost, and efficiency, it is important that the original material should be suitable for localization. This is achieved by appropriate design and development, and the corresponding process is referred to as internationalization. For a detailed explanation of the terms “localization” and “internationalization”, see <ptr target="#geo-i18n-l10n" type="bibref"/>.</p> <note type="ed">Note: This should refer to the best practice document as well, when ready.</note> <p>The increasing usage of XML as a medium for documentation-related content (e.g. <ref target="https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook#technical" >DocBook</ref>> and <ref target="https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita#technical" >DITA</ref> as formats for writing structured documentation, well suited to computer hardware and software manuals) and software-related content (e.g. the eXtensible User Interface Language <ptr target="#xul" type="bibref"/>) creates challenges and opportunities in the domain of XML internationalization and localization.</p> <div xml:id="motivation-its-issues"> <head>Typical Problems</head> <p>The following examples sketch one of the issues that currently hinder efficient XML-related localization: the lack of a standard, declarative mechanism that identifies which parts of an XML document need to be translated. Tools often cannot automatically perform this identification.</p> <exemplum xml:id="EX-motivation-its-1"> <head>Document with partially translatable content</head> <p>In this document it is difficult to distinguish between those <code>string</code> elements that are translatable and those that are not. Only the addition of an explicit flag could resolve the issue.</p> <egXML xmlns="http://www.tei-c.org/ns/Examples" target="examples/xml/EX-motivation-its-1.xml"/> </exemplum> <exemplum xml:id="EX-motivation-its-2"> <head>Document with partially translatable content</head> <p>Even when metadata are available to identify non-translatable text, the conditions may be quite complex and not directly indicated with a simple flag. Here, for instance, only the text in the nodes matching the expression <code>//component[@type!='image']/data[@type='text']</code> is translatable.</p> <egXML xmlns="http://www.tei-c.org/ns/Examples" target="examples/xml/EX-motivation-its-2.xml"/> </exemplum> </div> </div> <div xml:id="users-usage"> <head>Users and Usages of ITS</head> <div xml:id="potential-users"> <head>Potential Users of ITS</head> <p>The ITS specification aims to provide different types of users with information about what markup should be supported to enable worldwide use and effective internationalization and localization of content. The following paragraphs sketch these different types of users, and their usage of ITS. In order to support all of these users, the information about what markup should be supported to enable worldwide use and effective localization of content is provided in this specification in two ways:</p> <list> <item>abstractly in the data category descriptions: <ptr target="#datacategory-description" type="specref"/> </item> <item>concretely in the ITS schemas: <ptr target="#its-schemas" type="specref"/> </item> </list> <div xml:id="schema-dev-new"> <head>Schema developers starting a schema from the ground up</head> <p>This type of user will find proposals for attribute and element names to be included in their new schema (also called "host vocabulary"). Using the attribute and element names proposed in the ITS specification may be helpful because it leads to easier recognition of the concepts represented by both schema users and processors. It is perfectly possible, however, for a schema developer to develop his own set of attribute and element names. The specification sets out, first and foremost, to ensure that the required markup is available, and that the behavior of that markup meets established needs.</p> </div> <div xml:id="schema-dev-existing"> <head>Schema developers working with an existing schema</head> <p>This type of user will be working with schemas such as DocBook, DITA, or perhaps a proprietary schema. The ITS Working Group has sought input from experts developing widely used formats such as the ones mentioned.</p> <note><p>The question "How to use ITS with existing popular markup schemes?" is covered in more details (including examples) in a separate document: <ptr target="#xml-i18n-bp" type="bibref"/>.</p></note> <p>Developers working on existing schemas should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema.</p> <p>In some cases, an existing schema may already contain markup equivalent to that recommended in ITS. In this case it is not necessary to add duplicate markup since ITS provides mechanisms for associating ITS markup with markup in the host vocabulary which serves a similar purpose (see <ptr target="#associating-its-with-existing-markup" type="specref"/>). The developer should, however, check that the behavior associated with the markup in their own schema is fully compatible with the expectations described in this specification.</p> </div> <div xml:id="content-tool-vendor"> <head>Vendors of content-related tools</head> <p>This type of user includes companies which provide tools for authoring, translation or other flavors of content-related software solutions. It is important to ensure that such tools enable worldwide use and effective localization of content. For example, translation tools should prevent content marked up as not for translation from being changed or translated. It is hoped that the ITS specification will make the job of vendors easier by standardizing the format and processing expectations of certain relevant markup items, and allowing them to more effectively identify how content should be handled.</p> </div> <div xml:id="content-producers"> <head>Content producers</head> <p>This type of user comprises authors, translators and other types of content author. The markup proposed in this specification may be used by them to mark up specific bits of content. Aside: The burden of inserting markup can be removed from content producers by relating the ITS information to relevant bits of content in a global manner (see <ref target="#selection-global">global, rule-based approach</ref>). This global work, however, may fall to information architects, rather than the content producers themselves.</p> <p xml:id="cms-plain-text-fields">Content producers often work with content management systems (CMS). In various CMS, some of the CMS fields only allow to store plain text. For these fields, the current ITS 2.0 data categories can only be applied globally and not with local attributes. This issue should be addressed in another way, apart from the ITS 2.0 standard. One way would be to allow HTML in these fields if possible, or using an extra field which allows HTML input and save the plain text of this extra field in the plain text field.</p> </div> <div xml:id="users_machine-translation"> <head>Machine Translation Systems</head> <p>This type of service is intended for a broad user community ranging from developers and integrators through translation companies and agencies, freelance translators and post-editors to ordinary translation consumers and other types of MT employment. Data categories are envisaged for supporting and guiding the different automated backend processes of this service type, thereby adding substantial value to the service results as well as possible subsequent services. These processes include basic tasks, like parsing constraints and markup, and compositional tasks, such as disambiguation. These tasks consume and generate valuable metadata from and for third party users, for example, provenance information and quality scoring, and add relevant information for follow-on tasks, processes and services, such as MT post-editing, MT training and MT terminological enhancement.</p> </div> <div xml:id="users_text_analytics"> <head>Text Analytics</head> <p>This type of service provides automatically generated metadata for improving localization, data integration or knowledge management workflows. This class of users comprises of developers and integrators of services that automate language technology tasks such as domain classification, named entity recognition and disambiguation, term extraction, language identification and others. Text analytics services generate data that contextualizes the raw content with more explicit information. This can be used to improve the output quality in machine translation systems, search result relevance in information retrieval systems, as well as management and integration of unstructured data in knowledge management systems.</p> </div> <div xml:id="users_localization_workflow_managers"> <head>Localization Workflow Managers</head> <p>These types of users are concerned with localization workflows in which content goes through certain steps: preparation for localization, start of the localization process by e.g. a conversion into a bitext (aligned parallel text) format like <ptr target="#xliff" type="bibref"/>, the actual localization by human translators or machine translation and other adaptations of content, and finally the integration of the localized content into the original format. That format is often based on XML or HTML; (Web) content management systems are widely used for content creation, and their integration with localization workflows is an important task for the workflow manager. For the integration of content creation and localization, metadata plays a crucial role. E.g. an ITS data category like <ref target="#trans-datacat" >translate</ref> can trigger the extraction of localizable text. <quote>Metadata roundtripping</quote>, that is the availibility of metadata both before and after the localization process is crucial for many tasks of the localization workflow manager. An example is metadata based quality control, with checks like <quote>Have all pieces of content set to <code>translate="no"</code> been left unchanged?</quote>. Other pieces of metadata are relevant for proper internationalization during the localization workflow, e.g. the availibility of <ref target="#directionality">Directionality</ref> markup for adequate visualization of bidirectional text.</p> </div> </div> <div xml:id="ways-to-use-its"> <head>Ways to Use ITS</head> <p>The ITS specification proposes several mechanisms for supporting worldwide use and effective internationalization and localization of content. We will sketch them below [6063 lines skipped]
Received on Wednesday, 29 May 2013 16:14:09 UTC