RE: Translation control in HTML5

Process note: Please could you copy www-international on threads that are so clearly i18n-related (and keep them copied).  Otherwise you are likely to miss valuable contributions from folks who have a lot of experience in the things you are discussing. 

For the benefit of the www-international folks, the HTML5 WG has been discussing how to indicate that text should or should not be translated, eg. so that automatic translation tools like Google has can do a better job.  See a summary of the early part of the thread at http://lists.w3.org/Archives/Public/public-html/2008Aug/0069.html.
 


> From: public-html-request@w3.org [mailto:public-html-request@w3.org] On
> Behalf Of David Muschiol
> Sent: 03 August 2008 13:02

> Hmm, that is true, indeed.  But we should not forget that ITS is an
> XML technology, and mixing XML and HTML seems to end up in a

Actually not.  The ITS spec is expressed in terms of concepts, that can also be applied to HTML.

I also would like to strongly discourage the overloading of the lang attribute (and please especially avoid messing with xml:lang in XHTML5).  It also doesn't lend itself to specification of global rules or labeling attribute text, see below.

The ITS WG was chartered to meet pressing real world requirements - there is a substantial industry out there that struggles at the moment to translate or localize large quantities of material.  The translation scenario that introduced this thread is only the tip of the iceberg (though a very useful thing to discuss in its own right).

I think the accessibility folks have already raised requests for markup support for their community.  The ITS spec has done similar work in exploring and reporting requirements for enabling international deployment and localization of markup.   They don't require specific markup implementations, though of course a huge amount of discussion and work went into the formulation of the conceptual framework that should lead to a successful markup implementation. (I would recommend not reinventing the wheel there.)

For translate flags, the ITS spec early on recognized that authors wouldn't use the markup consistently if they had to mark up every <code> element or <span class="companyname"> or perhaps <blockquote lang="de"> in their document for translation, but would want to create some global rules that identified those constructs and applied translate information to them throughout the document.  This mechanism is something that's missing from the HTML spec, but it's also something that I don't think should be implemented in an adhoc manner.  ITS specifies an extensible mechanism that can be used for translate, but also used for a number of other things.

ITS provides a proposal for achieving this, and the ITS folks have taken the time to think through most of the issues associated with such an approach.

The ITS approach uses XPath for identification of the target fragments in a document, because is it very powerful as a selection tool.  I don't think HTML would *have* to use XPath, as long as there was a way to suitably select target fragments.  Note, however, that it may also be that most authors would use very simple constructs in XPath anyway.  For example, to prevent translation of all <code> elements, you'd say <translateRule selector="//code" translate="no"/>

The thing is, though, that you'd also need a way of overriding the global rules occasionally.  That's where a @translate attribute would come in. It can be attached to any <code> or other element (including a parent) to say *do* translate this one, or *don't* translate that one.

Note also that global rules can be used to indicate whether *attributes* should be translated or not, eg. title and alt, independently of the element content. Doing that by using lang values is particularly cumbersome for an author.

Note also that the ITS folks are having success in persuading localization tool developers and people working with major document formats to adopt approaches based on ITS.  It will be a big help to them if HTML's approach fits with that work.

RI
 

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/



> -----Original Message-----
> To: Doug Schepers
> Cc: public-html@w3.org
> Subject: Re: Translation control in HTML5
> 
> Doug-
> 
> On Sat, Aug 2, 2008 at 10:19 PM, Doug Schepers <schepers@w3.org> wrote:
> > ITS already addresses almost all these requirements:
> > * discrete 'translate' attributes for per-element control of translation
> > * default rules for a language, such that <code>, <kbd>, etc. are
> > automatically not translated
> 
> never-ending story of compatibility problems every time – I am just
> thinking of SVG in HTML.  I do not want to spread FUD, but that was my
> first thought…
> 
> But, way more important:  Some of the proposals suggested here in
> public-html are incredibly simple.  I am sure that if you just tell
> developers to put a <meta name="notranslate" content=".asdf, .foo">
> into their <head> or to add a translate="no" to specific tags, many of
> them would make use of these techniques and the quality of translation
> results could improve significantly in a short time, provided the
> implementors are diligent; whereas – if we are realistic – learning
> ITS plus XPath is not an option for the masses.
> 
> Yes, I know, introducing yet another technique for i18n control
> somehow means reinventing the wheel.  But I really think it is worth
> the trouble in this case.
> 
> > * the ability to link to a per-document set of rules that establishes custom
> > rules (which could easily be used site-wide, so an organization would only
> > have to write these rules once); this means that the <meta> tag isn't
> > needed.
> 
> Indeed, that sounds interesting…
> 
> > The 'lang' attribute proposal overrides the existing functionality in a way
> > that bears to much risk to breaking content, and doesn't offer any advantage
> > I can see over the 'translate' attribute.
> 
> Concerning this issue, I think I go along with you :-)
> 
> > I do note that ITS [1] uses XPath selectors, for which Ian Hickson has a
> > stated dislike.  Ian, is it possible that that is the reason you are
> > reluctant to merely adopt the ITS syntax, despite its obvious suitability to
> > the purpose?  If it were to use CSS selectors, would you have any other
> > objections to that proposal?  It's important to be clear what we're really
> > arguing about.
> 
> I am not Ian – may I put my two cents in anyway?  As mentioned above,
> I do not consider XPath an option for the masses of developers either.
>  If I understand aright, what you are thinking of now is a flavor of
> ITS with CSS selectors instead of XPath?  Well, that seems way more
> practical to me than regular ITS with XPath, but I still think the
> notation is pretty verbose, compared to the concise <meta
> name="notranslate">.  And if we do not want our plans to fail,
> simplicity has to be our primary goal.
> 
> In conclusion, I would still prefer simply introducing @translate and
> <meta name="notranslate"> as this seems to be the solution that would
> be accepted by most developers.
> 
> -david

Received on Tuesday, 5 August 2008 17:56:53 UTC