Language Identification in XML/XHTML/CSS from Bjoern Hoehrmann on 2001-10-20 (xml-editor@w3.org from October to December 2001)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 20 Oct 2001 22:24:00 +0200
To: www-style@w3.org, www-html@w3.org
Message-ID: <bml3tt4in8ft8qh8v8ks2sc7459706ce47@4ax.com>

Hi,

   HTTP/1.1 offers a Content-Language entity header to identify the
language of the relevant entity. HTML4 uses this header for language
identification (see section 8.1.2). XML uses higher-level protocol
information only for character encoding determination, not for language,
thus you can only use the xml:lang attribute in XML documents to
identify language information. While I think it was a bad idea in
general to add meta data features to core XML (even worse as attribute,
leaving of an opportunity to define a language for attribute values), we
are stuck with it. Proper language identification is vital for the
:lang() pseudo-class in CSS. It is currently defined [1], that the
document language has to define means for language identification of
elements.

Considering the case of XHTML documents not having any xml:lang
attribute, would the root element inherit the language from higher-level
protocol information and e.g.

  :lang(...)

would select anything? Are semantics inherited from HTML4 for XHTML 1.0
and XHTML M12N based XHTML family document types? If so, would it be
possible to note this somewhere? Or is this an error in XML 1.0 not
considering higher-level protocol information in this case? Should the
W3C Selectors module be fixed to consider higher-level protocol
information?

[1] http://www.w3.org/TR/selectors#lang-pseudo
-- 
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/

Received on Saturday, 20 October 2001 16:25:05 UTC