- From: Felix Sasaki <fsasaki@w3.org>
- Date: Fri, 07 Oct 2005 11:29:16 +0900
- To: "public-i18n-its@w3.org" <public-i18n-its@w3.org>
Dear all,
I'm very sorry that this message comes so late, but I'm very happy that I
can write it. We got some very detailed feedback from Andres Vega who is
working for Tektrans about the ITS requirements working draft. I would
like to give him feedback about our opinion, so could we talk about this
at the next teleconf? Everybody, please read this and be prepared to talk
about it.
Best,
Felix
------- Forwarded message -------
From: "Andres Vega" <av@tektrans.com>
To: "Felix Sasaki" <fsasaki@w3.org>
Cc:
Subject: RE: Comments on the I18N ITS Requirements Working Draft
Date: Sat, 01 Oct 2005 04:27:30 +0900
Hello Felix
Here are my comments regarding the proposal at
http://www.w3.org/TR/2005/WD-itsreq-20050805/
Usage Scenarios
2.1 Content Authoring
For this case I would recommend the use of a tag attribute (i.e. LOCALIZE)
that could be applied at any level, very much like the LANG attribute. It
could default to LOCALIZE=YES thus being omitted most of the time. To mark
a specific section as not to be changed it should have the specific value
LOCALIZE=NO. Any other value will provide localization specific
information. The attribute would be inserted at two different stages:
during content authoring (probably only the LOCALIZE=NO value most of the
time) and at the I18N or L10N stage (when the informative values are more
likely to be added).
The attribute should be read by any localization tool so as to block any
section marked as NO and to allow localization for any other value, while
displaying it as an informative reference to the translator.
An issue here appears with attribute fields that contain information that
should be itself marked as localizable or not. (analog to the HTML image
ALT attribute). These cases would probably still need to be treated
differently (i.e. through schema or templates)
2.2 Terminology
In this case a tag could possibly be defined to enclose the term (i.e.
<Term>XXX</Term>. Attributes could be used to link the term with an
external source (a glossary or terminology database) that would provide
all the term specific information needed. During authoring that
information may or may not be updated, in the latter case both terms and
glossaries could be semi-automatically updated by the terminology owner,
prior to document localization.
Other approach could be to make use of the LOCALIZE attribute. This would
be combined with the use of ID attributes, and would allow marking any
element as a term without excess marking. See comments on 3.7 further
below.
2.3 Software development
A set of tag attributes seems appropriate in this case, such as <Span
SizeLimit=15 SizeUnits=Bytes/Characters/Pixels...
The encoding could possibly be addressed separately, by using a tag
attribute (ENCODING or maybe CHARSET) probably at document level.
Example 1 would appear as:
<string id="s123" SizeLimit="15"
SizeUnits="Characters">Printing...</string>
...
3.2.1 Challenges
Example 5 would imply very good I18N by integrating software and
documentation to use the same localization resource bundles. While this is
probably the best scenario, it is not the more likely one. I would
consider Example 4 the one more likely to be needed. Following the
LOCALIZE attribute terminology it would appear as:
The Java statement <code><span
localize="no">System.out.println("</span>Hello world!<span
localize="no">");</span></code> prints the text...
...
3.4 Unique Identifier
About this section maybe I am a bit TraDOS biased, as that is the tool we
use most often.
It is true that TM techniques lacked context orientation in the past; but
now they provide some contextual techniques (i.e. Xtranslate) that take
into account not only the specific sentence to be translated but also the
previous and following sentences.
Other tools, such as Content Management Systems, allow storing information
in small elements that can be identified and reused from one document to
another. This systems might be combined with the use of an ID attribute to
allow for easy reuse of localized content. However one issue that often
appears with CMS is that either the number and size of the content
elements is reduced to very small units in order to allow more reuse (but
increasing the complexity of the administration of the CMS) or it is
defined using bigger units, which has the added problem that some markup
is more likely to appear inside the unit and it may need to be different
for different content output formats. If such is the case, those
differences may cause change analysis tools to be unable to recognize the
units as equivalent, further reducing the reusability of the localized
content.
Nevertheless, the possibility to define a unique identifier to any item
opens many other possibilities and is in itself advisable. (For example to
identify terms as suggested above)
3.5 Handling of Entities.
From my experience, it is best not to use entities (or variables in other
context) that are smaller than a sentence and bigger than a character
unit. For the reasons you already point out, it is very likely that the
documentation author does not foresee syntactic or gender/number/case
considerations of other languages different than the one the documentation
is written in. The use of sentence size entities is on the other hand
recommended, especially if they can be linked to software resources.
3.6 Identifying Language/Locale
Not much to add here. Maybe there should be separate identifiers for
Language/Locale and Script, as this could avoid diachronical issues
(languages that have changed the script in which they are written recently
enough for electronic documents existing in both; scripts coexisting for
the same language and locale as the Azerbaijan sample mentioned on 3.9,...)
3.7 Identifying terms
As stated above (2.2) I agree with the need to link terms to a Terminology
Database that provides for most of the required attributes. Term
identification could be done at the Authoring stage, thus defining the
terms that will populate/update the TD; at a later stage terminologists
could develop the needed content for each specific term.
Term specification could make use of the LOCALIZE attribute, along with
the ID attribute. This would imply that every term would have to be
localized (which is not necessarily a bad approach, as this would give its
localization control to the terminologist). This would also allow marking
any element as a term without excess marking. If more than one Terminology
Database is needed, the values of the LOCALIZE attribute could be changed
accordingly.
Regarding indexation, index entries should probably need its own separate
treatment (i.e. an <Index> identifier). If the index entry is itself a
term, then format and sorting specifics could be addressed by a
combination of the use of the default LANG attribute of the section and
two INDEX specific attributes indicating display and phonetics. I.e.
<index id="jk07" localize="term" indexlevel="Sorting:index"
sortstr="sorting:index">Index sorting</index>
Would both define the index entry to be displayed as:
Sorting,
index
And sorted using the "sorting:index" (or any other phonetic string); and
also identify the term "Index sorting" as a term to be stored in the TD
with a unique id ("jko7"). At the same time it would be implicit that the
term is translatable content.
3.8 Purpose Specification/Mapping
This specification seems a bit ambitious to me. Although I see its
application, I also see the complexity of mapping all source specific
attributes. Whenever possible I would rather make use of attributes that
can have local specific values that can be defaulted to a generic value.
(as with the LOCALIZE attribute).
The mapping technique could make good use of this and also allow for
introducing or updating markup at a later stage away from authoring.
3.9 Cultural aspects
Regarding orthography I would make use of a SCRIPT attribute (possibly
defaulting to the most extended script if missing).
Regarding other cultural, dialectal or stylistic variations I would
recommend to make use of the LOCALIZE attribute at a document or paragraph
size level.
3.11 Bidirectional text support.
This is fairly standard already, maybe a SCRIPT attribute could interfere
with it, or it may be complementary. I should think more about it
3.12 Translatability
I think this would be covered by a LOCALIZE attribute.
Rather than allow other tags to carry implicit information on
translatability I would prefer to postprocess already authored document at
the I18N or L10N stage, adding the appropriate LOCALIZE attributes were
needed. This also applies to 3.14 Limited impact.
Hope any of these suggestions are of any help.
I would appreciate your comments.
Best regards.
Andrés Vega Muñoz
Localisation Engineer
Tek Translation International
Tel: + 34 91 414 4434
Fax: + 34 91 414 4444
OneWorld Localization Center
www.tektrans.com
-----Original Message-----
From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: 26 September 2005 05:24
To: Andres Vega
Cc: Richard Ishida
Subject: Comments on the I18N ITS Requirements Working Draft
Dear Andres,
This is Felix Sasaki from the i18n activity of W3C [1]. We met at the
Unicode conference in Florida. I hope you had a save trip back and are
doing well.
At the conference you showed some interest in the work of the ITS Working
Group, after the presentation from Richard and me. I was wondering if you
had time to take a look at the working draft on the topic [2] which our
working grou published in August. Every comment or suggestion from you
would be very welcome.
Looking forward to hear from you & with best regards,
Felix Sasaki
[1] http://www.w3.org/International/
[2] http://www.w3.org/TR/2005/WD-itsreq-20050805/
Received on Friday, 7 October 2005 02:29:27 UTC