Proposed text for ITS spec introductory material from Richard Ishida on 2006-02-24 (public-i18n-its@w3.org from January to March 2006)

From: Richard Ishida <ishida@w3.org>
Date: Fri, 24 Feb 2006 18:37:20 -0000
To: <public-i18n-its@w3.org>
Message-ID: <00f501c63971$59437a80$6501a8c0@w3cishida>

Chaps,

I began reading the latest version of the ITS spec in preparation for next week. I have scribbled a number of editorial comments on my paper copy, but I felt like a couple of the introductory sections needed more than that. I also figured that it would be almost as easy for me to write alternative text as to clarify what I'm thinking so that someone else could evaluate and/or implement it. So I wrote what follows. It is only a first pass, so there may be things that can be improved.

SECTION 1.1

Section 1.1 still talks only about localization and still ignores international use of schemas. I propose the following replacement text for the first two paragraphs:

[[
Content or software that is authored in one language (i.e. source language) is often made available in additional languages. This is done through a process called localization, where the original material is translated and adapted to the target audience.

In addition, document formats expressed by schemas may be used by people in different parts of the world, and these people may need special markup to support the local language or script. For example, people authoring in languages such as Arabic, Hebrew, Persian or Urdu need special markup to demarcate directionality in mixed direction text.

>From the viewpoints of feasibility, cost, and efficiency, it is important that the original material should be suitable for localization and international use. This is achieved by appropriate design and development, and the corresponding process is referred to as internationalization. For a detailed explanation of the terms "localization" and "internationalization", see [l10n i18n].
]]

SECTION 2

Section 2, Basic Concepts, still hits me as a description of ITS from the engineer's point of view, rather than describing the how it intersects with the potential user's interests. (It's like having documentation for Powerpoint that just went through the pull-down menus in order, rather than having sections such as 'How to create a new presentation', 'How to work with the master', etc.) After reading this section I find I still have to work hard at reassembling the information in my brain in terms of what I knew when I started and where I am now, and in terms of where we're going with this.

Here is a proposal for an alternative approach.

[[

2 Basic Concepts

2.1 Potential users of ITS

The ITS specification aims to provide schema developers with information about what markup should be supported to enable worldwide use of their schemas and effective localization of the content developed using that schema. This information is provided in an abstract way in the data category descriptions, but specific proposals for implementation are also made in the specification.

One group of people who will use this information will be developing new schemas from the ground up. In the specification they will find proposals for attribute and element names to be included in their new schema. Using the same names as proposed here may be helpful because it leads to easier recognition of the concepts represented by both authors and localization tool developers. It is perfectly possible, however, for the schema developer to develop their own set of tag and element names. The specification sets out, first and foremost, to ensure the required markup is available, and that the behaviour of that markup meets established needs.

Another group of users of this specification will be working with existing schemas, such as DocBook, DITA, or perhaps an in-house schema.

The ITS Working Group has sought input from people developing widely used formats such as DocBook and DITA, and specification provides examples of how we feel those specific formats could be adapted to support ITS.

Developers working on existing schemas should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema.

In some cases, the schema may already contain markup equivalent to that recommended in ITS. In this case it is not necessary to add duplicate markup. The developer should, however, check that the behaviour associated with the markup in their own schema is fully compatible with the expectations described in this specification.

Other users of the ITS specification will be translation tool developers. When content is sent for translation, it is important to ensure that such tools recognize what to do with the various bits of content described by the markup. For example, translation tools should prevent content marked up as not for translation from being changed. It is hoped that the ITS specification will make the job of these developers easier by standardising the expected behaviour of certain relevant markup items, and allowing them to more effectively identify how content should be handled.

The markup proposed in this specification may also be used by content authors to mark up specific bits of content. However, we will describe below how the burden of inserting markup can sometimes be removed from content authors and the data categories can be related to relevant bits of content in a more global manner. This work may fall to information architects, rather than the content authors themselves.

2.2 Ways to implement ITS

The ITS specification provides a set of element and attribute names that can be included in a schema, but it also goes beyond that to specify a mechanism for describing various aspects of a schema in terms of translatability and internationalization.

We will explore the possible approaches below. For the purpose of illustration, we will use examples of ways to indicate that certain parts of content should or should not be translated. There are three ways of indicating this information:

1. a content author uses an attribute on a particular element in the content to say that the text should not be translated

2. a document developer uses markup at the top of the document to identify a particular type of element or context in which the content should not be translated

3. a schema developer uses constructs in the schema itself to indicate that specific parts of the content should not be translated.

The first two approaches above can be likened to the use of CSS in XHTML. Using a style attribute, an XHTML content author may assign a colour to a particular paragraph. That author could also have used the style element at the top of the page to say that all paragraphs of a particular class or in a particular context would be coloured red.

2.3 Using local markup

Example 4 shows how a content author may use an ITS attribute to indicate what text should be translated and what text should be protected from translation. Translation tools that are aware of the meaning of this attribute can then screen the relevant content from the translation process.

Example 4 goes here [Note the relevant parts of these examples should be bolded for easy identification using a tag such as strong for WAI accessibility]

For this to work, the schema developer will need to add the its:translate attribute to the schema as a common attribute or on all the relevant element definitions.

Note how there is an expectation in this case that inheritance play a part in identifying which content is to be translated and which not. Tools that process this content for translation will need to manage the scoping.

2.4 Using documentRule directives

Example 5 shows a different approach to identifying non-translatable content, similar to that used with a style element in XHTML, but using an ITS-defined element called its:documentRules.

Example 5 goes here

The head of a document can contain an its:documentRules element, which contains one or more documentRule elements. In addition to one or more ITS data category attributes, the documentRule element contains a corresponding set of ITS selector attributes (in the example translateSelector). As their name suggests, they select (or designate) the XML node or nodes to which a corresponding ITS data category attribute pertains. The values of ITS selector attributes are XPath absolute location paths. Information for the handling of namespaces in these path expressions is contained in the ITS element ns which is a child of documentRules.

This approach has the following benefits:

- Content authors do not have to concern themselves with creating additional markup or verifying that the markup was applied correctly. ITS data categories are associated with sets of XML nodes (for example all p elements in an XML instance)

- Changing the rules can be done in a single location, rather than by searching and modifying the markup throughout a document (or documents, if the documentRules element is stored as an external entity)

- ITS data categories can designate attribute values as well as elements.

- It is possible to map ITS markup to existing markup (for example the term element in DITA) [Ed. not a clear example unless we show the ITS equivalent - and I think there is none, so what about citing the DITA translate attribute?]

For this to work, the schema developer needs to add the documentRules and associated markup to the schema. In some cases this may allow the schema developer to avoid adding other ITS markup (such as an its:translate attribute) to the elements in the schema, however, it is likely that authors will want to use attributes on markup from time to time to override the general rule.

For specification of the translate flag, the contents of the documentRules element would normally be designed by an information architect familiar with the document format and familiar with, or working with someone familiar with, the needs of the localization group.

2.5 Using schemaRule

Example 6 shows an alternative approach to designating the XML node or nodes to which a corresponding ITS data category attribute pertains. This time putting the information directly into the schema itself, using its:schemaRule. Note that this is only possible for schemas developed using W3C XML Schema ...

Example 6 goes here

This example defines all term elements to be non-translatable by default.

[More text about how this works and who would do it]

2.6 Overwriting/precedence and inheritance

The power of ITS selector attributes comes at a price: rules related to overwriting/precedence, and inheritance, have to be established.

Example 7 goes here

In this example, the ITS data category attribute translate appears twice: in a documentRule , and on a specific p element. Since the ITS selector attribute in the documentRule selects all p elements, the question arises what the value for the translate data category of the p element which has local markup is. ITS provides precedence and inheritance rules which answer questions like this. In the example, the value is "no" (that is the content of the p element should not be translated).

2.7 Using ITS elements

The above example of a translate flag applied an ITS data category to an attribute node. Many of the ITS data categories can be expressed using attribute nodes, but not all. Some are expressed using element nodes. Here is an example of such an approach.

....
]]

Hope that helps, and that I have understood the mechanism correctly.

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/

Received on Friday, 24 February 2006 18:37:24 UTC