[whatwg] Tag Soup: Blocks-in-inlines from Lachlan Hunt on 2006-01-25 (public-whatwg-archive@w3.org from January 2006)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Thu, 26 Jan 2006 00:02:12 +1100
Message-ID: <43D776D4.1070904@lachy.id.au>
Billy Wong wrote:
> On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
>> I'm not saying it won't break anything, but every single change we make
>> to the parsing could possibly break any number of the billions of pages
>> on the web in any number of browsers.
> 
> But using your method (swapping inline node and block node) would
> break presently valid and correct webpages.

Such pages are invalid because inline-level elements are not allowed to 
contain block-level elements.  HTML pages containing the following:

<span>
   <div>...</div>
</span>

could be considered well-formed (if you apply the concept of 
well-formedness to HTML, even though it's not formally defined for it), 
but it's certainly not valid according to any official DTD.

> If breaking things is unavoidable, I prefer breaking things which are written incorrectly.

No-one is intending to break anything that is written correctly.

> My idea is very extreme but simple and effecient:
>     Parse the page regardless of what between "</" & ">".  See what's
> written inside the close-tag merely a visual clue.
> 
> Example: <span><div>X</span>Y</div>
> + span
>   + div
>     + #text: X
>   + #text: Y

I'm kind of confused by what you're trying to do there.  You seem to be 
implicitly closing the div immediately before the span.  But then the Y 
  doesn't seem to be a child of the span at all in the markup, it looks 
like it should be a child of the div, yet in your DOM, it's not a child 
of the div, but is of the span.

The DOM look equivalent to this markup:

   <span><div>X</div>Y</span>

which is insane.  It would make a little more sense if it were like this:

   + span
     + div
       + #text: X
   + #text: Y

In other words, it would be equivlant to this markup:

<span><div>X</div></span>Y

That is actually quite sane and is what OpenSP does with invalid HTML,. 
regardless of which elements are used (presumably according to some SGML 
rules), but it would not be compatible with the current state of the web 
at all, and so is not a real option.

> To correctly written webpages, this should pose no problems.  To
> incorrect webpages, they deserve it since the point they ask the UA to
> use "standard mode".

In theory, that sounds nice, but you have to remember:

   "to a rough approximation, all the content on the Web is errorneous,
    invalid, or non-conformant." -- Hixie

So, to say "they deserve it" to 100% of the web (roughly speaking) isn't 
really an option, unfortunately.  It's ok to say it to the most 
pathological of cases that depend on one particular browser's insane and 
undefined error recovery techniques, yet already breaks in everything 
else, but not to the whole web.

-- 
Lachlan Hunt
http://lachy.id.au/
Received on Wednesday, 25 January 2006 05:02:12 UTC