Re: review of "The root element" subsection from Robert Burns on 2007-07-10 (public-html@w3.org from July 2007)

From: Robert Burns <rob@robburns.com>
Date: Tue, 10 Jul 2007 04:51:32 -0500
To: Andrew Sidwell <takkaria@gmail.com>
Cc: HTML Working Group <public-html@w3.org>
Message-Id: <32709AC9-9C05-4C02-AEB9-D0DCAEB08F7A@robburns.com>

On Jul 9, 2007, at 6:40 AM, Andrew Sidwell wrote:

>
> Robert Burns wrote:
>>
>> Charset attribute
>>
>> Suggest adding charset attribute to root element rather than adding a
>> charset name to the <meta> element. This will be easier for  
>> authors to
>> use. It will also be easier for UAs to pre-parse. (Rob Burns)
>
> The rationale for the <meta charset=""> attribute combination is that
> UAs already implement it, because people tend to leave things  
> unquoted,
> like:
>
> <META HTTP-EQUIV=Content-Type CONTENT=text/html; charset=ISO-8859-1>
>
> Thus, adding an element to the root element would add yet another  
> place
> UAs have to check for charset data.

Thanks for that information. I had suspected that might be part of  
the motivation. My suggestion is not meant to overturn that practice.  
Certainly UAs should continue to use BOMs and:

<meta http-equiv="content-type" content="text/html; charset=utf-8" >

or even:

<meta; charset="utf-8" >

if that's what they already do.

My suggestion arose from the concern that the meta element with the  
charset attribute should be the first element in the head. I'm  
curious is that how many of the current UAs work? In other words, do  
current UAs stop at the first meta in searching for encoding hints?  
If that's the case, that's not something I've heard before.

In any event, my suggestion arose for several reasons. First, I think  
the text encoding situation is such a nagging problem still after all  
of these years. Second,, until it is handled exclusively through BOMs  
(or some other special character, if ever), its going to require  
extra attention in educating authors about something particularly  
esoteric that many do not understand. Its perhaps one of the few  
places in the document where you can change semantics to an  
incompatible state: i.e., setting the charset to something that  
doesn't reflect the encoding of the document. This is very different  
from the places where one might incorrectly set the hinting for an  
hreflang or the like. Its also different than the content type in  
that MIME type is not as integral to the actual bits of the document  
as charset.

Second, for setting a value for the encoding that needs to appear  
early in the document and a value that can be contained as an  
attribute value, it makes a lot of sense to include that as an  
attribute on the root element. Pre-parsers will be able to find the  
value more easily and documents will not face the risk of the the  
meta element further down in the head. Also there will be less author  
error in placing the meta element in the incorrect order.

This is therefore a suggestion for long-term authoring conformance  
criterion. Obviously it only applies to the text/html serialization.  
If that's not expected to last for in the long-term, then I think its  
probably not worth promoting a solution like this.

Take care,
Rob

Received on Tuesday, 10 July 2007 09:51:43 UTC