RE: what's the language of a document ?

On Sat, February 6, 2010 06:55, Ian Hickson wrote:
> I've tried to update the spec to what was discussed with I18N at TPAC, in
> particular regarding the way Content-Language is processed.
> I ended up not making lang="" required or trigger a warning when it's
> omitted, because it's quite plausible that a document will not have a
> language at all, and because in many cases in practice language-detection
> heuristics are actually more reliable than the lang="" attribute anyway.
> However, if this isn't satisfactory, I would recommend bringing it up on
> the public-html list for further discussion.

sounds overly optimistic. In practice language-detection only supports a
small number of languages with any reliability. And i seriously doubt that
web browser developers would want to include language detection,
considering the overhead that an extensible language detection system
would require. And the amount of on going work to implement new languages
in the detection support.

I think I could throw together 100 pages, each in a different language,
use language detection libraries on them, and get a 0% detection rate. ;)

Obviously I could also select a range of languages and get close to 100%
detection rate.

Also I'd suggest there are instances where lang is very useful. In
particular CJK data, where web browsers tend to select fonts based on
language declaration, in absence of appropriate styling.

The CSS3 people are currently discussing CSS support for more advanced
OpenType support within CSS3 Fonts module. If this eventuates, then
language tagging could be used to trigger language rendering available in
an opentype font.

lang="" could be required or not required. but language detection is a
poor reason for deciding.


Andrew Cunningham
Research and Development Coordinator
State Library of Victoria

Received on Sunday, 7 February 2010 01:55:45 UTC