Re: Cleaning House from Philip Taylor on 2007-05-03 (public-html@w3.org from May 2007)

From: Philip Taylor <excors@gmail.com>
Date: Thu, 03 May 2007 01:43:14 +0100
To: public-html@w3.org
Message-ID: <46393022.90807@gmail.com>

Patrick H. Lauke wrote:
> I'm missing how subjecting B, I etc to the same processing rules and 
> definitions as any other unknown/removed/deprecated/obsoleted element 
> would break interoperability.

At least for parsing, they need to be processed under different rules to 
unrecognised elements in order to be compatible with the existing HTML 
content.

A document like:

     <!DOCTYPE HTML>
     <b> One <p> Two </b> Three

is parsed to

|      <b>
|        " One "
|      <p>
|        <b>
|          " Two "
|        " Three "

in FF3 and HTML5 (specifically html5lib), with the first two words being 
bold. It's parsed to a non-tree structure in IE6 but has the same 
rendering. Opera 9 parses it differently again, but does magic things 
outside the DOM to render it the same as everyone else.

A document with unknown elements like:

     <!DOCTYPE HTML>
     <x> One <p> Two </x> Three

is parsed to

|      <x>
|        " One "
|        <p>
|          " Two  Three "

in HTML5, identical to a <span>, and so a browser couldn't just parse it 
like that and then add "x { font-weight: bold }" to get the same effect 
as <b>.

I'm not actually sure why HTML5 parses unknown elements and <span> in 
that way, since it doesn't match IE6, though I assume there's a reason 
somewhere... But in any case, other deprecated/removed elements like 
<xmp> do have to be explicitly specified with special parsing rules and 
not treated like unknown elements, otherwise some sites will be parsed 
very wrongly.

So the specification has to 'support' all these elements in the sense of 
defining how they're processed by UAs (regardless of whether authors may 
or must not use them), rather than falling back on the behaviour for 
unrecognised elements, else it would be useless to anyone trying to 
parse the web (especially those who aren't already browser vendors with 
existing code that can handle these cases).

(Whether <b>, <i> etc should be conforming for authors is a completely 
unrelated issue to whether the specification should tell UAs what 
processing rules to use for them, so it's best to avoid mixing those 
issues together - I'm only talking about the latter, since I believe 
that was your question in the quote.)

-- 
Philip Taylor
philip@zaynar.demon.co.uk

Received on Thursday, 3 May 2007 00:44:50 UTC