[whatwg] Tag Soup: Blocks-in-inlines from Billy Wong on 2006-01-25 (public-whatwg-archive@w3.org from January 2006)

From: Billy Wong <billyswong@gmail.com>
Date: Wed, 25 Jan 2006 20:21:22 +0800
Message-ID: <500602e30601250421k27180e83mcb5d71d155652ce8@mail.gmail.com>
On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
> I'm not saying it won't break anything, but every single change we make
> to the parsing could possibly break any number of the billions of pages
> on the web in any number of browsers.

But using your method (swapping inline node and block node) would
break presently valid and correct webpages.  If breaking things is
unavoidable, I prefer breaking things which are written incorrectly. 
My idea is very extreme but simple and effecient:
    Parse the page regardless of what between "</" & ">".  See what's
written inside the close-tag merely a visual clue.

Example: <span><div>X</span>Y</div>
+ span
  + div
    + #text: X
  + #text: Y

To correctly written webpages, this should pose no problems.  To
incorrect webpages, they deserve it since the point they ask the UA to
use "standard mode".

On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
> Anne van Kesteren wrote:
> > Quoting Lachlan Hunt <lachlan.hunt at lachy.id.au>:
> >> 1.
> >> <em><p>X</em>Y</p>
> >>
> >> BODY
> >>   + P
> >>     + EM
> >>       + #text: X
> >>     + #text: Y
> >>
> >> The theory is that any inline elements
> >
> > This gives problems for new elements I assume... We already have a
> > problem with
> > <header><h1>test</h1></header>...
>
> I don't see how this affects new elements, it should only affect known
> inline elements.
>
> >> 2.
> >> <em><p>XY</p></em>
> >>
> >> BODY
> >>   + P
> >>     + EM
> >>       + #text: X
> >>       + #text: Y
> >
> > And this likely breaks existing content. Perhaps not for EM, but
> > certainly for
> > other inline elements, like <span>.
>
> I'm not saying it won't break anything, but every single change we make
> to the parsing could possibly break any number of the billions of pages
> on the web in any number of browsers.  However, the chances are that
> such pages are already broken is several browsers already (probably
> built for IE only, who's quirks we are definitely not keeping), so I
> don't see this as a huge problem.
>
> There's nothing wrong with saner parsing at the expense of a few broken
> pages which I'm sure will still remain readable (even if they don't look
> perfect) and/or be easily fixed by their authors.  Trying to remain 100%
> compatible with 100% of the web is physically impossible.
>
> However, span does show some interesting behaviour which should be made
> more consistent with other inline elements.
>
> <!DOCTYPE html><span><p>X</span>Y</p>
>
> Firefox:
> HTML
>    + HEAD
>    + BODY
>      + SPAN
>        + P
>          + #text: X
>      + #text: Y
>
> Opera 9/Win:
> HTML
>    + BODY
>      + SPAN
>        +P
>          +#text: X
>          +#text: Y
>
> IE6:
> HTML
>    + HEAD
>      + TITLE
>    + BODY
>      + SPAN
>        + P
>          + #text: X
>          + #text: Y
>        + #text: Y (Highlighted in red in the DOM view)
>
> --
> Lachlan Hunt
> http://lachy.id.au/
>
>
Received on Wednesday, 25 January 2006 04:21:22 UTC