W3C home > Mailing lists > Public > whatwg@whatwg.org > March 2007

[whatwg] Configure Apache to send the right MIME type for XHTML

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Wed, 07 Mar 2007 14:04:08 -0500
Message-ID: <45EF0CA8.5080508@metalab.unc.edu>
Henri Sivonen wrote:

> TagSoup exists today.

Yes, and I use it. However it constantly surprises people in the markup 
it generates, as hanging out for a day or two on the tagsoup-friends 
mailing list will show. That's not it's fault. There's just no one 
obvious way to fix all the broken markup that's out there. TagSoup picks 
one approach. HTML 5 picks another. Both will surprise people a lot of 
the time. At the parser level that can't be helped.

However at the document level it can be helped. When the document author 
takes the care to generate a well-formed document, they are rarely 
surprised by the resulting tree the parser builds. The tree is explicit 
in the markup. Explicit markup is more obvious and less surprising than 
the implicit fill-in both TagSoup and HTML 5 do.

Hmm, that brings up another question. Does the HTML 5 fixup algorithm 
ever change the *tree* for a well-formed (but invalid) document? For 
instance, if it finds an li element that is a child of a p, what would 
it do? Either ignoring the <li></li> tags, skipping the li element 
completely, or filling in a ul element would all change the tree.

I suspect it does one of these three things (or something similar like 
filling in an ol element) but without opening the spec or writing a 
sample program, I can't tell you which.

By contrast with a real XML parser, I can tell you what's going to 
happen without cracking open the spec. HTML5, TagSoup, and XML parse 
trees are all deterministic and thus predictable; but only the XML tree 
is *obvious*.

-- 
?Elliotte Rusty Harold  elharo at metalab.unc.edu
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Received on Wednesday, 7 March 2007 11:04:08 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:33 UTC