W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

RE: For review: Tagging text with no language

From: Yves Savourel <ysavourel@translate.com>
Date: Fri, 13 Apr 2007 16:21:06 -0600
To: <www-international@w3.org>
Cc: <ltru@lists.ietf.org>
Message-ID: <00b901c77e1a$07d382a0$8f05a8c0@BREIZH>

>> With respect to computer language snippets, isn't that what the <code> 
>> tag is for -- at least in XHTML?
> Yes, typically interpreted as switch to a monospaced font.  But maybe 
> not good enough to convince spell-checkers that they should skip this part,
> or to convince screenreaders that what follows might be not in the inherited 
> xml:lang.


Maybe we try to do a bit too much with xml:lang?

For example, here are two elements <codesample>: The first one contains some JavaScript code with strings and comments in English:

<codesample>
alert("This is English");//This comment is English too.
</codesample>

The second one contains the source code for a program is Glagol (a programming language that use Russian keywords) [1]:

<codesample>
 +;
     "...\\\";
  
   .("Hello World!")
   .
</codesample>

It seems clear that no matter the language tag used here, xml:lang simply cannot *at the same time*:
- flags whether the content is a "natural" language or a "programming" language
- indicates, if it's a programming language, which one
- indicates what "natural" language is used for keywords of the programming language
- specify what is the language of the "text" inside the code
- etc.

Maybe, for most practical purposes, xml:lang should be used to specify what is the main natural language of the textual content. So
here "en" in both cases.
Whether the "useful text" in the element is embedded within some kind of format, and what is that format, are different problems.

Localization tools, spell-checkers, etc. need more than just a language indicator to deal correctly with such content anyway. Some
of that can now be addressed with ITS [2] (e.g. whether the content is translatable or not). But at some point it may be safer to
stop trying defining a complex content with a unique label at the container level. In addition, if the content is complex and not
XML, it should have itself the means to label its own parts.

Cheers,
-yves

[1] <http://en.wikipedia.org/wiki/Glagol_(programming_language)>
[2] <http://www.w3.org/TR/its/>
Received on Friday, 13 April 2007 22:21:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:13 GMT