[whatwg] Tag Soup: Blocks-in-inlines

On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
> Billy Wong wrote:
> > On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
> >> I'm not saying it won't break anything, but every single change we make
> >> to the parsing could possibly break any number of the billions of pages
> >> on the web in any number of browsers.
> >
> > But using your method (swapping inline node and block node) would
> > break presently valid and correct webpages.
>
> Such pages are invalid because inline-level elements are not allowed to
> contain block-level elements.  HTML pages containing the following:
>
> <span>
>    <div>...</div>
> </span>
>
> could be considered well-formed (if you apply the concept of
> well-formedness to HTML, even though it's not formally defined for it),
> but it's certainly not valid according to any official DTD.
>
Sorry.  I don't notice that this is invaild.  I am new here.  What
makes inline-level element not feasible to contain block-level
elements??  I am confused.

> > If breaking things is unavoidable, I prefer breaking things which are written incorrectly.
>
> No-one is intending to break anything that is written correctly.

I should change my line to "break things that are not well-formed
instead of those well-formed"

>
> > My idea is very extreme but simple and effecient:
> >     Parse the page regardless of what between "</" & ">".  See what's
> > written inside the close-tag merely a visual clue.
> >
> > Example: <span><div>X</span>Y</div>
> > + span
> >   + div
> >     + #text: X
> >   + #text: Y
>
> I'm kind of confused by what you're trying to do there.  You seem to be
> implicitly closing the div immediately before the span.  But then the Y
>   doesn't seem to be a child of the span at all in the markup, it looks
> like it should be a child of the div, yet in your DOM, it's not a child
> of the div, but is of the span.
>
> The DOM look equivalent to this markup:
>
>    <span><div>X</div>Y</span>
>
It is my fault for not explaining it more clearly.  Here I treat a
close tag like, what is written inside the close-tag doesn't matter to
the parser.  So your observation is correct.  I don't read and guess
what should I do when </span> is given instead of </div>.  I treat any
</xyz> after <div> to be </div>.  If somebody write a webpage not
well-formed, then the error will be displayed in such a distubing way
that no one can ignore it.  If the error is by mistake (which I
presume to be the only reason of a page not well-formed), web
developer(s) can catch the source of problem more easily - the error
will be observable *from* the starting point of the error *to* the
ending point of the error.  If this is too insane to everyone, as I
have said before, this idea is "very extreme".  I do not suggest that
this will be the best choice.

> which is insane.  It would make a little more sense if it were like this:
>
>    + span
>      + div
>        + #text: X
>    + #text: Y
>
> In other words, it would be equivlant to this markup:
>
> <span><div>X</div></span>Y
>
> That is actually quite sane and is what OpenSP does with invalid HTML,.
> regardless of which elements are used (presumably according to some SGML
> rules), but it would not be compatible with the current state of the web
> at all, and so is not a real option.
>
> > To correctly written webpages, this should pose no problems.  To
> > incorrect webpages, they deserve it since the point they ask the UA to
> > use "standard mode".
>
> In theory, that sounds nice, but you have to remember:
>
>    "to a rough approximation, all the content on the Web is errorneous,
>     invalid, or non-conformant." -- Hixie
>
> So, to say "they deserve it" to 100% of the web (roughly speaking) isn't
> really an option, unfortunately.  It's ok to say it to the most
> pathological of cases that depend on one particular browser's insane and
> undefined error recovery techniques, yet already breaks in everything
> else, but not to the whole web.
>
First, my idea would not, and should not, break the whole web.  If it
is really deployed, it would only break webpage that are not
well-formed in this particular way.
Second, this discussion begins to be for error-handling in HTML5.  I
believe the motto "Make the wrong looks wrong".  Since the
introduction of CSS and its ability to do "div span { blahblahblah;
}", we can't go back to IE's insectual appoach.  If the error-handling
mechanism make people feel mixing open-close-tags "okay" and then the
mechanism doesn't work up to their expectation occasionally, they will
blame the browser and never notice their fault.  Unless we can find a
perfect mechanism which will never "break" their expectation, the
problem will go on.  And I suppose the mechanism we are discussing
here should be used only in HTML5 onward, something the whole web not
using these day.
Of course, if someone can suggest a mechanism which does not "break"
things, I will love it.

> --
> Lachlan Hunt
> http://lachy.id.au/
>
>

Received on Wednesday, 25 January 2006 07:17:51 UTC