Re: Specifying Language in XHTML and HTML

On Friday, August 4, 2006, 10:08:18 AM, Jon wrote:

JH> M.T. Carrasco Benitez wrote:
>> Best Practice 4 should also recommended for XHTML:
>> 
>>  <html lang="en">
>> 
>> and *not* the double language labelling
>> 
>>  <html lang="en" xml:lang="en">
>> 
>> Having double language labelling is unnecessary.
>> 
>> If one goes down this path, one should do the same for all ovelapping of
>> XML and HTML; e.g., 
>> 
>>  <p id="foo" xml:id="id">

JH> The analogy doesn't follow. An XML-procssor based on a validating XML 
JH> parser with knowledge of xml:lang is capable of recognising <p id="foo">
JH> as having an ID of "foo", matching #foo in URIRefs, etc. It is not 
JH> capable of recognising lang="en" as identifying the content as being in 
JH> English.

Well, I think that you both have a point and also that you both miss part
of the picture. Tomas is correct that double labelling is a pain and
that there are other circumstances (such as xml:id) where an older,
vocabulary specific and a newer, xml-generic attribute coexist.

Tomas is incorrect to suggest using just lang as a resolution to this.
Some html-specific processors will understand it. But all xml processors
will understand xml:lang so if one is going to choose just one, and if
writing XHTML, then using just xml:lang is a reasonable and
defensible position.

Jon is correct that a validating parser can automatically understand id.
Also,a non-validating parser can,too,if it fetches the external DTD
subset (which is optional) of if there are declarations in the
internal DTD subset (which is possible but uncommon).

Another difference between xml:lang and xml:id is that xml:lang was
there from the first XML Rec and is thus well established; while xml:id is
fairly new and just starting to gain traction.

So in terms of which attribute needs a belt-and-braces approach[1] and
which does not, its arguable that using xml:lang by itself for XHTML,as for
all other XML formats, has always been okay; while using both xml:id and
id might be desirable in some cases (and they should both have the same
attribute values; in addition id should be declared in a DTD and both id
and xml:id can usefully be declared in a RelaxNG grammar such that if an
element has xml:id, then id is not of type ID).



[1] Quaint en-GB ism implying two ways to hold up ones trousers (en-US:
pants), using a belt and using braces (en-US: suspenders).

-- 
 Chris Lilley                    mailto:chris@w3.org
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG

Received on Friday, 4 August 2006 11:10:18 UTC