Re: review of "The root element" subsection from Robert Burns on 2007-07-10 (public-html@w3.org from July 2007)

From: Robert Burns <rob@robburns.com>
Date: Tue, 10 Jul 2007 16:55:42 -0500
To: HTML Working Group <public-html@w3.org>
Message-Id: <15F76FD1-F1BF-45AD-A386-287BCE4C3A98@robburns.com>

On Jul 10, 2007, at 7:21 AM, Simon Pieters wrote:

>
>>> Perhaps, but it isn't compatible with existing UAs.
>>
>> Do we already have some tests on this?
>
> We do now... ;-)
>
>    http://simon.html5.org/test/html/parsing/encoding/001.htm

Im not sure what that was supposed to test. It would be helpful if it  
said something like: "There should be a smiley face below if your  
browser is using <meta charset=""> to determine encoding over <html  
charset="">.

I tried a slight variation on your test:

<!doctype html>
<html charset=utf-8 >
<head>
<meta charset=iso-8859-1 >
</head>
<body>
<p>There should be a white smiley face below if your browser supports  
@charset in the root html element:
<p>☺
</body>
</html>

When saved as utf-8 with no BOM, Safari displays it as UTF-8. My  
default for Safari is Latin1. So that's one browser that it is  
compatible with this approach. That's only one browser tested, but  
that would just mean we already have one forward-looking HTML5  
friendly UA.

In any event, the test needs to be done with non-UTF encodings.  
Otherwise I think browsers might be too smart in detecting UTF  
encodings.

> Even if it didn't complicate implementation, it still isn't  
> compatible with current UAs, which is the main drawback.

I'm here because I'm mostly interested in the forward looking portion  
of HTML5. If others are not so interested in that, then I understand.  
However, there are portions of this draft that also are not  
"compatible with current UAs". So pointing out that drawback is  
simply pointing out the obvious. Some portions of HTML5 are  
compatible with existing UAs. Some portions of HTML5 are not. This  
proposal falls in the latter category. That is also its advantage as  
I see it. It still follows the criteria of other portions of the  
draft in that it does not break things. For some period of time,  
authors would need to make sure their character encodings were  
consistent on the <meta> and the <html> attributes, but there would  
come a time in the future — a time when it was interoperable to use  
<video>, <audio>, <canvas> etc — when an author could simply place a  
charset attribute on the root element and be done with ti.

Ideally, we should tell authors to use a BOM compatible encoding (I'm  
curious how well that's supported now) and only use a different  
encoding only if those encodings don't meet their needs (not sure  
what those needs would be). However, as long as authors feel the need  
to use other encodings, we should probably try to make it as simple  
as possible.

Take care,
Rob

Received on Tuesday, 10 July 2007 21:55:53 UTC