RE: [all] Request for Last Call Review, Internationalization Tag Set (ITS) from Carter, Jerry on 2006-05-22 (public-i18n-its@w3.org from April to June 2006)

From: Carter, Jerry <jerry.carter@nuance.com>
Date: Mon, 22 May 2006 11:34:50 -0400
To: Yves Savourel <yves@opentag.com>
Cc: public-i18n-its@w3.org
Message-ID: <F8940C21CD563F49BC884A274C4653DF044A1BE0@bn-exch1.speechworks.com>
Yves:

Thank you for taking the time to put together such a detailed and
informative response.  In case a declaration is needed/helpful, I accept
each answer that has been given without objection.

I did make a few minor comments below.

-=- Jerry 

> -----Original Message-----
> From: Yves Savourel [mailto:yves@opentag.com]
> Sent: Monday, May 22, 2006 3:40 AM
> To: Carter, Jerry
> Cc: public-i18n-its@w3.org
> Subject: RE: [all] Request for Last Call Review, Internationalization Tag
> Set (ITS)
> 
> Hi Jerry,
> 
> Thank you for your feedback. We've discussed your notes a bit in the past
> weeks, and I was tasked to summarize our answers.
> 
> If some of them still do not address your concerns, please let us know and
> we will enter the specific points in our issue list, so
> it's treated along with other comments.
> 
> 
>> 6.2 Translatability.
>>
>> Translatability is presented as a binary decision and in section 1.1.2
>> the phrase 'Model-T' is given as an example of an item that would be
>> invariant.
>> However, a French translation would likely replace Model-T with 'Ford
>> T'.
>> Terms of art or product names do vary more than one might expect.
>> Consider the recent 'Buick LaCrosse' sedan whose name has required
>> translation for certain markets [2].
>>
>> Likewise, is '12' translatable?  One might wish to express the integer
>> in Chinese ideographs or leave it in Arabic numerals depending on
>> content or based on the understanding of the likely reader.
>>
>> I do not see translatability as a straightforward decision.
>> Annotation data describing the term could very well be useful to the
>> translator, but eventually one wishes for word or phrase level
>> descriptions a la WordNet [3] to guide the translator, a capability
>> that does not appear to be supported by ITS.
> 
> I think you have a point for Example 1. We'll try to find a better example
> of 'not-translatable' text.
> 
> As for the note about seeing translatability as a binary decision. While
> we would agree in general with your comments, but we think
> we are trying to achieve something a bit different: In practice it is
> unlikely that one is going to (or can) set translatability at
> a very fine level, especially when working from the source viewpoint: The
> decision to translate or not is a really made for each
> target language when the translator go through the text.
> 
> To some degree this is related to a discussion the group had early on the
> naming the attribute. 'translate' was the choice we made.
> And by this we mean: From the viewpoint of the XML content, this text is
> translatable text, make it accessible to the translators.
> How this is going to be translated (all of it, or part of it, or none of
> it) is to be decided by the translator and the decision may
> be different for each language.

This makes perfect sense as a higher-level annotation.  I'm not surprised
that the group has already considered these issues and I thank you for
sharing your thoughts with me.

> > 6.3 Localization Information
> >
> > Here general annotation information is provided.  We considered RDF
> > while working on PLS and I wonder if RDF might be more appropriate here.
> 
> We thought RDF was a bit too much for the purpose locInfo tries to
> achieve, which is quite simple: associate a simple note to a part
> of content. In addition, we needed to provide an association mechanism to
> allow the re-use of existing localization notes.

We're still exploring use of RDF within the Voice Browser WG Pronunciation
Lexicon and have had some similar concerns.

> > 6.4 Terminology
> >
> > This could be a subclass of 6.3.
> 
> We thought identifying terms and associating localization notes to text
> were different enough to require two distinct data
> categories.

I trust your instincts.

> > 6.5 Directionality
> >
> > The unicode character set already supports embedding of directionality
> > marks and overrides (e.g. 0x200E, 0x200F, 0x202D, 0x202E, 0x202C) when
> > specifications do not make provisions for explicit elements such as
> > the XHMTL bidirectional text module [4].
> >
> > Is this necessary?
> 
> We followed Unicode's own recommendation (http://www.w3.org/TR/unicode-
> xml/#Charlist), and the guidelines provided
> by the GEO WG (http://www.w3.org/International/questions/qa-bidi-
> controls).

Thank you very much for these links.

As I understand the recommended best practices, explicit directionality via
attributes is the preferred mechanism where possible.

"There are, however, places in an (X)HTML file where markup cannot be used,
and the Unicode formatting code characters are therefore appropriate. These
include attribute text and the <title> element (which support only character
content)."  <http://www.w3.org/International/questions/qa-bidi-controls>

This thinking should now inform several working drafts within the VBWG and
MMI groups.

> > 6.6 Ruby
> >
> > Again, this is back to the general annotation issue.  Is Ruby best
> > applied for this purpose?
> 
> Here again, we tried to integrate existing recommendations (Ruby Module).
> 
> 
> > 6.7 Language Information
> >
> > Specifications that do not support language tagging might be broken.
> > Is this really the best way to fix them?  Document markup that does
> > support language tagging in a non-traditional manner is presumably
> > okay since readers are expected to understand the semantics of that
> > specification.  I don't see the 'langPointer="@mylangattribute"' case
> > as justifying this capability.
> 
> We don't think langRule tries to fix broken specifications, just to allow
> ITS-aware applications to know something they cannot know
> because they do not have a semantic knowledge of the given formats. We
> still recommend using xml:lang if possible. This will be made
> more clear from the Best Practices document (still a First WD)
> 
> In short, we say: you should use xml:lang. But if you have already
> something that has the same semantic and value set, and can't
> change it to xml:lang, then you can use langRule to indicate that to ITS-
> enabled application.

Makes sense.  I will look forward to the Best Practices document.

> > 6.8 Elements within Text
> >
> > Here again, I would expect the reader to understand the semantics of
> > the document markup.
> 
> Interesting comment: It shows we may have not done good enough job to
> explain that many of the ITS is mainly made for applications
> working at a 'generic' level, rather than applications knowing the
> specific semantics of each XML vocabulary.
> 
> For example, a generic XML spell-checker would not know the semantic
> associated with the elements of each document types. But it
> would just need to understand ITS to know which parts of the document to
> check.

Ah, that makes sense!  Yes, a sentence or two to better motivate the section
would be welcome.

> Thanks again for your help,
> -yves
Received on Monday, 22 May 2006 15:35:15 UTC