W3C home > Mailing lists > Public > public-html@w3.org > November 2008

Re: what's the language of a document ?

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 13 Nov 2008 21:39:43 +0000 (UTC)
To: Daniel Glazman <daniel.glazman@disruptive-innovations.com>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0811132137080.1041@hixie.dreamhostps.com>

On Thu, 13 Nov 2008, Daniel Glazman wrote:
>
> Ian Hickson wrote:
> 
> > > What is the language of the document?
> > 
> > The unknown language.
> 
> Ok, so how's a page reader made for blind people suppose to read that ? 
> What language aural renderer should it use by default ?

Similarly, what should a translation tool use as the source language?

Unfortunately, there's no good answer. No default is really going to work 
here. Heck, even the actual data that authors put in isn't actually that 
accurate, when you start examining real content.

It turns out that there are some pretty reliable heuristics one can use to 
distinguish languages amongst a set of known languages, so that's probably 
the best practical solution for speech renderers, especially when there's 
no content-level declarative answer.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 13 November 2008 21:40:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:24 GMT