W3C home > Mailing lists > Public > www-international@w3.org > April to June 2010

RE: ISSUE-88 - Change proposal (new update)

From: CE Whitehead <cewcathar@hotmail.com>
Date: Fri, 14 May 2010 17:11:28 -0400
Message-ID: <SNT142-w65AD767994EF6120657556B3FD0@phx.gbl>
To: <mark@macchiato.com>, <xn--mlform-iua@xn--mlform-iua.no>, <fantasai.lists@inkedblade.net>, <public-html@w3.org>, <public-i18n-core@w3.org>, <www-international@w3.org>, <ian@hixie.ch>

Hi.
 

Date: Fri, 14 May 2010 11:13:26 -0700
From: mark@macchiato.com
> I hesitate to mix into this conversation, because I have only followed it intermittantly, and the discussion seems overly  

> complicated. But I have a couple of comments.



> Audience Languages. The distinction between the "audience" languages and the "document" languages is seems tenuous and 
> artificial. I'm guessing that the best characterization of "audience languages" is that someone who doesn't speak one of "audience" 
> languages would not find the document as a whole to be understandable. 
understandable?  Or maybe in some cases usable.  For example, I can go to Russian lessons targeting an English speaker and not understand them but if I work my way through them I can use them I guess.
> For example, I could have a document that is mostly 
> English with a some Hebrew phrases mentioned. While both English and Hebrew occur in the document, it would not be useful for a > non-English speaker, while it could be useful for an English speaker who didn't know Hebrew.
In this case it's my understanding (from Richard Ishida's "Internationalization Best Practices"  -- now a note; see http://www.w3.org/TR/i18n-html-tech-lang/ -- and the discussion that ensued around that draft) that the document-wide text processing language would be English as would the target audience language; text in Hebrew would be set off with span or whatever and identified as Hebrew.
If a page however mixes content equally (equally is the key) in two languages, but is designed for speakers of one, then it's nice to be able to specify the audience language as that particular language but the text processing language as null.
 

In some cases text may be intended for audiences who speak two languages (all members speak both, maybe even both reasonably well) apparently; there may be blogs like this (I personally think actually that the Middle East -- Israel and the Arab world -- may tend to produce such blogs effectively mixing two languages but I am no expert on these bloggers; I thought I had data but cannot now find it).

 

In some cases people put up a document with content translated into several languages (instead of putting up several separate documents) -- usually because the content is short; in this case I think that all audience languages should be listed but that the overall document-wide text processing language should be left unspecified (that is specified as the empty string, lang="") while the individual elements within the document containing content in a distinct language should have a language specified.

 

However if the current html spec goes through unchanged there will not be a way to specify all the various languages that a document's audience might speak.

 

I agree with you that sometimes trying to specify language is a thorny issue, and when either a document or an audience language cannot be specified then the best solution is to use lang="" (meaning lang equal the emptry string) -- although some browsers keep looking for a language tag to replace the empty string when they should be happy with a character set declaration.

 

Best,

 

C. E. Whitehead

cewcathar@hotmail.com 



> Language vs Languages. It is also odd to talk about "the" language of a document as if there can be only one. Even speaking of 
> "the predominant language" is a misnomer: look at http://unicode.org/iso15924/standard/index.html, for example. While we 
> can't make a syntactic change for compatibility reasons, there should at least be an explanation of that it is just a syntactic 
> pidgeonhole that people have to deal with.


> Mark 		 	   		  
Received on Friday, 14 May 2010 21:12:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 14 May 2010 21:12:05 GMT