- From: Felix Sasaki <fsasaki@w3.org>
- Date: Fri, 07 Oct 2005 11:29:16 +0900
- To: "public-i18n-its@w3.org" <public-i18n-its@w3.org>
Dear all, I'm very sorry that this message comes so late, but I'm very happy that I can write it. We got some very detailed feedback from Andres Vega who is working for Tektrans about the ITS requirements working draft. I would like to give him feedback about our opinion, so could we talk about this at the next teleconf? Everybody, please read this and be prepared to talk about it. Best, Felix ------- Forwarded message ------- From: "Andres Vega" <av@tektrans.com> To: "Felix Sasaki" <fsasaki@w3.org> Cc: Subject: RE: Comments on the I18N ITS Requirements Working Draft Date: Sat, 01 Oct 2005 04:27:30 +0900 Hello Felix Here are my comments regarding the proposal at http://www.w3.org/TR/2005/WD-itsreq-20050805/ Usage Scenarios 2.1 Content Authoring For this case I would recommend the use of a tag attribute (i.e. LOCALIZE) that could be applied at any level, very much like the LANG attribute. It could default to LOCALIZE=YES thus being omitted most of the time. To mark a specific section as not to be changed it should have the specific value LOCALIZE=NO. Any other value will provide localization specific information. The attribute would be inserted at two different stages: during content authoring (probably only the LOCALIZE=NO value most of the time) and at the I18N or L10N stage (when the informative values are more likely to be added). The attribute should be read by any localization tool so as to block any section marked as NO and to allow localization for any other value, while displaying it as an informative reference to the translator. An issue here appears with attribute fields that contain information that should be itself marked as localizable or not. (analog to the HTML image ALT attribute). These cases would probably still need to be treated differently (i.e. through schema or templates) 2.2 Terminology In this case a tag could possibly be defined to enclose the term (i.e. <Term>XXX</Term>. Attributes could be used to link the term with an external source (a glossary or terminology database) that would provide all the term specific information needed. During authoring that information may or may not be updated, in the latter case both terms and glossaries could be semi-automatically updated by the terminology owner, prior to document localization. Other approach could be to make use of the LOCALIZE attribute. This would be combined with the use of ID attributes, and would allow marking any element as a term without excess marking. See comments on 3.7 further below. 2.3 Software development A set of tag attributes seems appropriate in this case, such as <Span SizeLimit=15 SizeUnits=Bytes/Characters/Pixels... The encoding could possibly be addressed separately, by using a tag attribute (ENCODING or maybe CHARSET) probably at document level. Example 1 would appear as: <string id="s123" SizeLimit="15" SizeUnits="Characters">Printing...</string> ... 3.2.1 Challenges Example 5 would imply very good I18N by integrating software and documentation to use the same localization resource bundles. While this is probably the best scenario, it is not the more likely one. I would consider Example 4 the one more likely to be needed. Following the LOCALIZE attribute terminology it would appear as: The Java statement <code><span localize="no">System.out.println("</span>Hello world!<span localize="no">");</span></code> prints the text... ... 3.4 Unique Identifier About this section maybe I am a bit TraDOS biased, as that is the tool we use most often. It is true that TM techniques lacked context orientation in the past; but now they provide some contextual techniques (i.e. Xtranslate) that take into account not only the specific sentence to be translated but also the previous and following sentences. Other tools, such as Content Management Systems, allow storing information in small elements that can be identified and reused from one document to another. This systems might be combined with the use of an ID attribute to allow for easy reuse of localized content. However one issue that often appears with CMS is that either the number and size of the content elements is reduced to very small units in order to allow more reuse (but increasing the complexity of the administration of the CMS) or it is defined using bigger units, which has the added problem that some markup is more likely to appear inside the unit and it may need to be different for different content output formats. If such is the case, those differences may cause change analysis tools to be unable to recognize the units as equivalent, further reducing the reusability of the localized content. Nevertheless, the possibility to define a unique identifier to any item opens many other possibilities and is in itself advisable. (For example to identify terms as suggested above) 3.5 Handling of Entities. From my experience, it is best not to use entities (or variables in other context) that are smaller than a sentence and bigger than a character unit. For the reasons you already point out, it is very likely that the documentation author does not foresee syntactic or gender/number/case considerations of other languages different than the one the documentation is written in. The use of sentence size entities is on the other hand recommended, especially if they can be linked to software resources. 3.6 Identifying Language/Locale Not much to add here. Maybe there should be separate identifiers for Language/Locale and Script, as this could avoid diachronical issues (languages that have changed the script in which they are written recently enough for electronic documents existing in both; scripts coexisting for the same language and locale as the Azerbaijan sample mentioned on 3.9,...) 3.7 Identifying terms As stated above (2.2) I agree with the need to link terms to a Terminology Database that provides for most of the required attributes. Term identification could be done at the Authoring stage, thus defining the terms that will populate/update the TD; at a later stage terminologists could develop the needed content for each specific term. Term specification could make use of the LOCALIZE attribute, along with the ID attribute. This would imply that every term would have to be localized (which is not necessarily a bad approach, as this would give its localization control to the terminologist). This would also allow marking any element as a term without excess marking. If more than one Terminology Database is needed, the values of the LOCALIZE attribute could be changed accordingly. Regarding indexation, index entries should probably need its own separate treatment (i.e. an <Index> identifier). If the index entry is itself a term, then format and sorting specifics could be addressed by a combination of the use of the default LANG attribute of the section and two INDEX specific attributes indicating display and phonetics. I.e. <index id="jk07" localize="term" indexlevel="Sorting:index" sortstr="sorting:index">Index sorting</index> Would both define the index entry to be displayed as: Sorting, index And sorted using the "sorting:index" (or any other phonetic string); and also identify the term "Index sorting" as a term to be stored in the TD with a unique id ("jko7"). At the same time it would be implicit that the term is translatable content. 3.8 Purpose Specification/Mapping This specification seems a bit ambitious to me. Although I see its application, I also see the complexity of mapping all source specific attributes. Whenever possible I would rather make use of attributes that can have local specific values that can be defaulted to a generic value. (as with the LOCALIZE attribute). The mapping technique could make good use of this and also allow for introducing or updating markup at a later stage away from authoring. 3.9 Cultural aspects Regarding orthography I would make use of a SCRIPT attribute (possibly defaulting to the most extended script if missing). Regarding other cultural, dialectal or stylistic variations I would recommend to make use of the LOCALIZE attribute at a document or paragraph size level. 3.11 Bidirectional text support. This is fairly standard already, maybe a SCRIPT attribute could interfere with it, or it may be complementary. I should think more about it 3.12 Translatability I think this would be covered by a LOCALIZE attribute. Rather than allow other tags to carry implicit information on translatability I would prefer to postprocess already authored document at the I18N or L10N stage, adding the appropriate LOCALIZE attributes were needed. This also applies to 3.14 Limited impact. Hope any of these suggestions are of any help. I would appreciate your comments. Best regards. Andrés Vega Muñoz Localisation Engineer Tek Translation International Tel: + 34 91 414 4434 Fax: + 34 91 414 4444 OneWorld Localization Center www.tektrans.com -----Original Message----- From: Felix Sasaki [mailto:fsasaki@w3.org] Sent: 26 September 2005 05:24 To: Andres Vega Cc: Richard Ishida Subject: Comments on the I18N ITS Requirements Working Draft Dear Andres, This is Felix Sasaki from the i18n activity of W3C [1]. We met at the Unicode conference in Florida. I hope you had a save trip back and are doing well. At the conference you showed some interest in the work of the ITS Working Group, after the presentation from Richard and me. I was wondering if you had time to take a look at the working draft on the topic [2] which our working grou published in August. Every comment or suggestion from you would be very welcome. Looking forward to hear from you & with best regards, Felix Sasaki [1] http://www.w3.org/International/ [2] http://www.w3.org/TR/2005/WD-itsreq-20050805/
Received on Friday, 7 October 2005 02:29:27 UTC