CVS WWW/International/multilingualweb/lt/drafts/its20

Update of /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20
In directory gil:/tmp/cvs-serv20361

Added Files:
	its20-for-editing-sec1-sec2.odd 
Log Message:
odd file for sec 1-2 spec editing


--- /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20-for-editing-sec1-sec2.odd	2013/05/29 16:14:02	NONE
+++ /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20-for-editing-sec1-sec2.odd	2013/05/29 16:14:02	1.1
<?xml version="1.0" encoding="UTF-8"?><?oxygen RNGSchema="tools/tei-w3c.rnc" type="compact"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:its="http://www.w3.org/2005/11/its"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:rng="http://relaxng.org/ns/structure/1.0"
  xmlns:spec="http://example.com/xmlspec" xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:xi="http://www.w3.org/2001/XInclude" xml:lang="en">
  <header xmlns="http://example.com/xmlspec">
    <title>Internationalization Tag Set (ITS) Version 2.0</title>
    <w3c-designation>ITS20</w3c-designation>
    <w3c-doctype>W3C Last Call Working Draft</w3c-doctype>
    <pubdate>
      <day>21</day>
      <month>May</month>
      <year>2013</year>
    </pubdate>
    <publoc>
      <loc href="http://www.w3.org/TR/2013/WD-its20-20130521/">
        http://www.w3.org/TR/2013/WD-its20-20130521/</loc>
    </publoc>
    <altlocs>
      <loc href="its20.odd">ODD/XML document</loc>
      <loc href="itstagset20.zip">self-contained zipped archive</loc>
      <loc href="diffs/diff-wd20130521-wd20130411.html">XHTML Diff markup to previous publication
        2013-04-11</loc>
    </altlocs>
    <prevlocs>
      <loc href="http://www.w3.org/TR/2013/WD-its20-20130411/">
        http://www.w3.org/TR/2013/WD-its20-20130411/</loc>
    </prevlocs>
    <latestloc>
      <loc href="http://www.w3.org/TR/its20/">http://www.w3.org/TR/its20/</loc>
    </latestloc>
    <authlist>
      <author>
        <name>Shaun McCane</name>
        <affiliation>Invited Expert</affiliation>
      </author>
      <author>
        <name>Dave Lewis</name>
        <affiliation>TCD</affiliation>
      </author>
      <author>
        <name>Christian Lieske</name>
        <affiliation>SAP AG</affiliation>
      </author>
      <author>
        <name>Arle Lommel</name>
        <affiliation>DFKI</affiliation>
      </author>
      <author>
        <name>Jirka Kosek</name>
        <affiliation>UEP</affiliation>
      </author>
      <author>
        <name>Felix Sasaki</name>
        <affiliation>DFKI / W3C Fellow</affiliation>
      </author>
      <author>
        <name>Yves Savourel</name>
        <affiliation>ENLASO</affiliation>
      </author>
    </authlist>
    <!--	   <errataloc role="spec-conditional" href="http://www.w3.org/International/its/itstagset/its-errata.html"/>
	  <translationloc role="spec-conditional" href="http://www.w3.org/2003/03/Translations/byTechnology?technology=its"/>
	   -->
    <abstract>
      <p>The technology described in this document - the <emph>Internationalization Tag Set (ITS)
          2.0</emph> - enhances the foundation to integrate automated processing of human language
        into core Web technologies. ITS 2.0 bears many commonalities with is predecessor, <loc
          href="http://www.w3.org/TR/2007/REC-its-20070403/">ITS 1.0</loc> but provides additional
        concepts that are designed to foster the automated creation and processing of multilingual
        Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage
        processing based on the XML Localization Interchange File Format (XLIFF), as well as the
        Natural Language Processing Interchange Format (NIF).</p>
    </abstract>
    <status>
      <p>
        <emph>This section describes the status of this document at the time of its publication.
          Other documents may supersede this document. A list of current W3C publications and the
          latest revision of this technical report can be found in the <loc
            href="http://www.w3.org/TR/">W3C technical reports index</loc> at
          http://www.w3.org/TR/.</emph>
      </p>
      <p>The technology described in this document - the <emph>Internationalization Tag Set (ITS)
        2.0</emph> - enhances the foundation to integrate automated processing of human language
        into core Web technologies. ITS 2.0 bears many commonalities with is predecessor, <loc
          href="http://www.w3.org/TR/2007/REC-its-20070403/">ITS 1.0</loc> but provides additional
        concepts that are designed to foster the automated creation and processing of multilingual
        Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage
        processing based on the XML Localization Interchange File Format (XLIFF), as well as the
        Natural Language Processing Interchange Format (NIF).</p>
      <p>This document was published by the <loc
          href="http://www.w3.org/International/multilingualweb/lt/">MultilingualWeb-LT Working
          Group</loc> as a Last Call Working Draft. The Last Call period ends 11 June 2013. The publication reflects changes made since the previous
        <loc href="http://www.w3.org/TR/2012/WD-its20-20121206/">Last Call publication 6 December 2012</loc> and the <loc href="http://www.w3.org/TR/2013/WD-its20-20130411/">ordinary working draft 11 April 2013</loc>. The Working Group expects to advance this
        document to Recommendation status (see <loc
          href="http://www.w3.org/2004/02/Process-20040205/tr.html#maturity-levels">W3C document
          maturity levels</loc>).</p>

      <p>All <loc href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/disposition-of-comments-1st-last-call.html">last call issues</loc> in the normative sections (from <specref ref="notation-terminology"/> to
          <specref ref="datacategory-description"/> and <specref ref="normative-references"/> to
          <specref ref="its-schemas"/>) have been resolved. The other, non-normative sections contain only
        explanatory material and will be updated in a later working draft. The Working Group
        encourages feedback until 11 June 2013.</p>
      
      <p>Substantive changes during the first last call period are: <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/67">a new regular expression definition for allowed characters</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/68">re-formulation of disambiguation data category to "text analysis"</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/90">making directionality normative again</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/91">removal of the ruby section</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/97">aligning ITS 2.0 translate in HTML5 with the HTML5 definition of the attribute</loc>, <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/118">defining default behaviour for Elements within Text in HTML5</loc>.</p>
      
      <p>Since the <loc href="http://htmlpreview.github.io/?https://raw.github.com/finnle/ITS-2.0-Testsuite/master/its2.0/testSuiteDashboard.html">ITS 2.0 test suite</loc> already has a high coverage for normative features of this specification, the Working Group expects to advance the specification directly to Proposed Recommendation status.</p>

      <p>To give feedback send your comments to <loc
          href="mailto:public-multilingualweb-lt-comments@w3.org"
          >public-multilingualweb-lt-comments@w3.org</loc>. Use "Comment on ITS 2.0 specification
        WD" in the subject line of your email. The <loc
          href="http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/">archives
          for this list</loc> are publicly available. See also <loc
          href="https://www.w3.org/International/multilingualweb/lt/track/issues/">issues discussed
          within the Working Group</loc> and the <loc href="#changelog-since-20130411">list of
          changes since the previous publication</loc>.</p>

      <p>Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p>

      <p>This document was produced by a group operating under the <loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</loc>. W3C maintains a <loc href="http://www.w3.org/2004/01/pp-impl/53116/status">public list of any patent disclosures</loc> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p>
    </status>
    <langusage>
      <language id="en">en</language>
    </langusage>
    <revisiondesc>
      <p>This is the first version of this document.</p>
    </revisiondesc>
  </header>
  <text>
    <body>
      <div xml:id="introduction">
        <head>Introduction</head>

        <p>
          <emph>This section is informative.</emph>
        </p>
        <p>ITS 2.0 is a technology to add metadata to Web content, for the benefit of localization,
          language technologies, and internationalization. The ITS 2.0 specification both identifies
          concepts (such as <q>Translate</q>) that are important for internationalization and
          localization, and defines implementations of these concepts (termed “ITS data categories”)
          as a set of elements and attributes called the <emph>Internationalization Tag Set
            (ITS)</emph>. The document provides implementations for HTML, serializations in <ref target="http://persistence.uni-leipzig.org/nlp2rdf/">NIF</ref> (NLP Interchange Format) <ptr target="#nif-reference" type="bibref"/>, and provides
          definitions of ITS elements and attributes in the form of XML Schema <ptr
            target="#xmlschema1" type="bibref"/> and RELAX NG <ptr target="#relaxng" type="bibref"
          />.</p>

        <p>This document aims to realize many of the ideas formulated in the <ref
            target="http://www.w3.org/TR/2012/WD-its2req-20120524/">ITS 2.0 Requirements
            document</ref>, in <ptr target="#itsreq" type="bibref"/> and <ptr target="#reqlocdtd"
            type="bibref"/>.</p>
        <p>Not all requirements listed there are addressed in this document. Those which are not
          addressed here are either covered in <ptr type="bibref" target="#xml-i18n-bp"/>
          (potentially in an as yet unwritten best practice document on multilingual Web content),
          or may be addressed in a future version of this specification.</p>

        <div xml:id="relation-to-its10-and-new-principles">
          <head>Relation to ITS 1.0 and New Principles</head>
          <div xml:id="relation-to-its10">
            <head>Relation to ITS 1.0</head>
            <p>ITS 2.0 has the following relations to ITS 1.0 <ptr target="#its10" type="bibref"/>:</p>
            <list type="unorderd">
              <item><p>It adopts and maintains the following principles from ITS 1.0: </p><list
                  type="unorderd">
                  <item>It adopts the use of data categories to define discrete units of
                    functionality</item>
                  <item>It adopts the separation of data category definition from the mapping of the
                    data category to a given content format</item>
                  <item>It adopts the conformance principle of ITS1.0 that an implementation only
                    needs to implement one data category to claim conformance to ITS 2.0</item>
                </list>
              </item>
              <item>ITS 2.0 supports all ITS 1.0 data category definitions and adds new definitions,
                with the exceptions of <ref target="#directionality">Directionality</ref> and Ruby.</item>
              <item>ITS 2.0 adds a number of new data categories not found in ITS 1.0.</item>
              <item>While ITS 1.0 addressed only XML, ITS 2.0 specifies implementations of data
                categories in <emph>both</emph> XML <emph>and</emph> HTML.</item>
            </list>
          </div>
          <div xml:id="ruby-in-its2">
            <head>Ruby and ITS 2.0</head>
            <p>ITS 1.0 provided the <ref target="http://www.w3.org/TR/2007/REC-its-20070403/#ruby-annotation">Ruby data category</ref>. ITS 2.0 does not provide ruby since at the time of writing, a stable model for ruby was not available. There are ongoing discussions about the <ref target="http://www.w3.org/TR/html51/text-level-semantics.html#the-ruby-element">ruby model in HTML5</ref>. Once these discussions are settled, in a subsequent version of ITS, the ruby data category may be re-introduced.</p>
          </div>
          <div xml:id="new-principles">
            <head>New Principles</head>
            <p>ITS 2.0 also adds the following principles and features not found in ITS 1.0:</p>
            <list type="unorderd">
              <item>ITS 2.0 data categories are intended to be format neutral, with support for XML,
                HTML, and NIF: a data category implementation only needs to support a single content
                format mapping in order to support a claim of ITS 2.0 conformance.</item>
              <item>ITS 2.0 provides algorithms to generate NIF out of HTML or XML with ITS 2.0
                metadata.</item>
              <item>A global implementation of ITS 2.0 requires at least the <ref target="#xpath"
                  >XPath version 1.0</ref>. Other versions of XPath or other query languages (e.g.,
                CSS Selectors) can be expressed via a dedicated <ref target="#queryLanguage"
                  >queryLanguage</ref> attribute.</item>
            </list>
            <p xml:id="its20-new-data-categories">The new data categories included in ITS 2.0
              are:</p>
            <list type="unorderd">
              <item><ref target="#domain">Domain</ref></item>
              <item><ref target="#textanalysis">Text Analysis</ref></item>
              <item><ref target="#LocaleFilter">Locale Filter</ref></item>
              <item><ref target="#provenance">Provenance</ref></item>
              <item><ref target="#externalresource">External Resource</ref></item>
              <item><ref target="#target-pointer">Target Pointer</ref></item>
              <item><ref target="#idvalue">Id Value</ref></item>
              <item><ref target="#preservespace">Preserve Space</ref></item>
              <item><ref target="#lqissue">Localization Quality Issue</ref></item>
              <item><ref target="#lqrating">Localization Quality Rating</ref></item>
              <item><ref target="#mtconfidence">MT Confidence</ref></item>
              <item><ref target="#allowedchars">Allowed Characters</ref></item>
              <item><ref target="#storagesize">Storage Size</ref></item>
            </list>
          </div>
        </div>

        <div xml:id="motivation-its">
          <head>Motivation for ITS</head>
          <p>Content or software that is authored in one language (the <term>source language</term>)
            is often made available in additional languages or adapted with regard to other cultural
            aspects. This is done through a process called <term>localization</term>, where the
            original material is translated and adapted to the target audience.</p>
          <p>In addition, document formats expressed by schemas may be used by people in different
            parts of the world, and these people may need special markup to support the local
            language or script. For example, people authoring in languages such as Arabic, Hebrew,
            Persian, or Urdu need special markup to specify directionality in mixed direction
            text.</p>
          <p>From the viewpoints of feasibility, cost, and efficiency, it is important that the
            original material should be suitable for localization. This is achieved by appropriate
            design and development, and the corresponding process is referred to as
            internationalization. For a detailed explanation of the terms “localization” and
            “internationalization”, see <ptr target="#geo-i18n-l10n" type="bibref"/>.</p>
          <note type="ed">Note: This should refer to the best practice document as well, when
            ready.</note>
          <p>The increasing usage of XML as a medium for documentation-related content (e.g. <ref
              target="https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook#technical"
              >DocBook</ref>> and <ref
              target="https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita#technical"
              >DITA</ref> as formats for writing structured documentation, well suited to computer
            hardware and software manuals) and software-related content (e.g. the eXtensible User
            Interface Language <ptr target="#xul" type="bibref"/>) creates challenges and
            opportunities in the domain of XML internationalization and localization.</p>

          <div xml:id="motivation-its-issues">
            <head>Typical Problems</head>

            <p>The following examples sketch one of the issues that currently hinder efficient
              XML-related localization: the lack of a standard, declarative mechanism that
              identifies which parts of an XML document need to be translated. Tools often cannot
              automatically perform this identification.</p>
            <exemplum xml:id="EX-motivation-its-1">
              <head>Document with partially translatable content</head>
              <p>In this document it is difficult to distinguish between those <code>string</code>
                elements that are translatable and those that are not. Only the addition of an
                explicit flag could resolve the issue.</p>
              <egXML xmlns="http://www.tei-c.org/ns/Examples"
                target="examples/xml/EX-motivation-its-1.xml"/>
            </exemplum>
            <exemplum xml:id="EX-motivation-its-2">
              <head>Document with partially translatable content</head>
              <p>Even when metadata are available to identify non-translatable text, the conditions
                may be quite complex and not directly indicated with a simple flag. Here, for
                instance, only the text in the nodes matching the expression
                  <code>//component[@type!='image']/data[@type='text']</code> is translatable.</p>
              <egXML xmlns="http://www.tei-c.org/ns/Examples"
                target="examples/xml/EX-motivation-its-2.xml"/>
            </exemplum>
          </div>
        </div>
        <div xml:id="users-usage">
          <head>Users and Usages of ITS</head>
          <div xml:id="potential-users">
            <head>Potential Users of ITS</head>
            <p>The ITS specification aims to provide different types of users with information about
              what markup should be supported to enable worldwide use and effective
              internationalization and localization of content. The following paragraphs sketch
              these different types of users, and their usage of ITS. In order to support all of
              these users, the information about what markup should be supported to enable worldwide
              use and effective localization of content is provided in this specification in two
              ways:</p>
            <list>
              <item>abstractly in the data category descriptions: <ptr
                  target="#datacategory-description" type="specref"/>
              </item>
              <item>concretely in the ITS schemas: <ptr target="#its-schemas" type="specref"/>
              </item>
            </list>
            <div xml:id="schema-dev-new">
              <head>Schema developers starting a schema from the ground up</head>
              <p>This type of user will find proposals for attribute and element names to be
                included in their new schema (also called "host vocabulary"). Using the attribute
                and element names proposed in the ITS specification may be helpful because it leads
                to easier recognition of the concepts represented by both schema users and
                processors. It is perfectly possible, however, for a schema developer to develop his
                own set of attribute and element names. The specification sets out, first and
                foremost, to ensure that the required markup is available, and that the behavior of
                that markup meets established needs.</p>
            </div>
            <div xml:id="schema-dev-existing">
              <head>Schema developers working with an existing schema</head>
              <p>This type of user will be working with schemas such as DocBook, DITA, or perhaps a
                proprietary schema. The ITS Working Group has sought input from experts developing
                widely used formats such as the ones mentioned.</p>
              <note><p>The question "How to use ITS with existing popular markup schemes?" is
                  covered in more details (including examples) in a separate document: <ptr
                    target="#xml-i18n-bp" type="bibref"/>.</p></note>
              <p>Developers working on existing schemas should check whether their schemas support
                the markup proposed in this specification, and, where appropriate, add the markup
                proposed here to their schema.</p>
              <p>In some cases, an existing schema may already contain markup equivalent to that
                recommended in ITS. In this case it is not necessary to add duplicate markup since
                ITS provides mechanisms for associating ITS markup with markup in the host
                vocabulary which serves a similar purpose (see <ptr
                  target="#associating-its-with-existing-markup" type="specref"/>). The developer
                should, however, check that the behavior associated with the markup in their own
                schema is fully compatible with the expectations described in this
                specification.</p>
            </div>
            <div xml:id="content-tool-vendor">
              <head>Vendors of content-related tools</head>
              <p>This type of user includes companies which provide tools for authoring, translation
                or other flavors of content-related software solutions. It is important to ensure
                that such tools enable worldwide use and effective localization of content. For
                example, translation tools should prevent content marked up as not for translation
                from being changed or translated. It is hoped that the ITS specification will make
                the job of vendors easier by standardizing the format and processing expectations of
                certain relevant markup items, and allowing them to more effectively identify how
                content should be handled.</p>
            </div>
            <div xml:id="content-producers">
              <head>Content producers</head>
              <p>This type of user comprises authors, translators and other types of content author.
                The markup proposed in this specification may be used by them to mark up specific
                bits of content. Aside: The burden of inserting markup can be removed from content
                producers by relating the ITS information to relevant bits of content in a global
                manner (see <ref target="#selection-global">global, rule-based approach</ref>). This
                global work, however, may fall to information architects, rather than the content
                producers themselves.</p>
              <p xml:id="cms-plain-text-fields">Content producers often work with content management
                systems (CMS). In various CMS, some of the CMS fields only allow to store plain
                text. For these fields, the current ITS 2.0 data categories can only be applied
                globally and not with local attributes. This issue should be addressed in another
                way, apart from the ITS 2.0 standard. One way would be to allow HTML in these fields
                if possible, or using an extra field which allows HTML input and save the plain text
                of this extra field in the plain text field.</p>
            </div>
            <div xml:id="users_machine-translation">
              <head>Machine Translation Systems</head>
              <p>This type of service is intended for a broad user community ranging from developers
                and integrators through translation companies and agencies, freelance translators
                and post-editors to ordinary translation consumers and other types of MT employment.
                Data categories are envisaged for supporting and guiding the different automated
                backend processes of this service type, thereby adding substantial value to the
                service results as well as possible subsequent services. These processes include
                basic tasks, like parsing constraints and markup, and compositional tasks, such as
                disambiguation. These tasks consume and generate valuable metadata from and for
                third party users, for example, provenance information and quality scoring, and add
                relevant information for follow-on tasks, processes and services, such as MT
                post-editing, MT training and MT terminological enhancement.</p>
            </div>
            <div xml:id="users_text_analytics">
              <head>Text Analytics</head>
              <p>This type of service provides automatically generated metadata for improving
                localization, data integration or knowledge management workflows. This class of
                users comprises of developers and integrators of services that automate language
                technology tasks such as domain classification, named entity recognition and
                disambiguation, term extraction, language identification and others. Text analytics
                services generate data that contextualizes the raw content with more explicit
                information. This can be used to improve the output quality in machine translation
                systems, search result relevance in information retrieval systems, as well as
                management and integration of unstructured data in knowledge management systems.</p>
            </div>
            <div xml:id="users_localization_workflow_managers">
              <head>Localization Workflow Managers</head>
              <p>These types of users are concerned with localization workflows in which content
                goes through certain steps: preparation for localization, start of the localization
                process by e.g. a conversion into a bitext (aligned parallel text) format like <ptr
                  target="#xliff" type="bibref"/>, the actual localization by human translators or
                machine translation and other adaptations of content, and finally the integration of
                the localized content into the original format. That format is often based on XML or
                HTML; (Web) content management systems are widely used for content creation, and
                their integration with localization workflows is an important task for the workflow
                manager. For the integration of content creation and localization, metadata plays a
                crucial role. E.g. an ITS data category like <ref target="#trans-datacat"
                  >translate</ref> can trigger the extraction of localizable text. <quote>Metadata
                  roundtripping</quote>, that is the availibility of metadata both before and after
                the localization process is crucial for many tasks of the localization workflow
                manager. An example is metadata based quality control, with checks like <quote>Have
                  all pieces of content set to <code>translate="no"</code> been left
                  unchanged?</quote>. Other pieces of metadata are relevant for proper
                internationalization during the localization workflow, e.g. the availibility of <ref
                  target="#directionality">Directionality</ref> markup for adequate visualization of
                bidirectional text.</p>
            </div>
          </div>
          <div xml:id="ways-to-use-its">
            <head>Ways to Use ITS</head>
            <p>The ITS specification proposes several mechanisms for supporting worldwide use and
              effective internationalization and localization of content. We will sketch them below

[6063 lines skipped]

Received on Wednesday, 29 May 2013 16:14:09 UTC