- From: Elliotte Harold <elharo@metalab.unc.edu>
- Date: Wed, 07 Mar 2007 14:04:08 -0500
Henri Sivonen wrote: > TagSoup exists today. Yes, and I use it. However it constantly surprises people in the markup it generates, as hanging out for a day or two on the tagsoup-friends mailing list will show. That's not it's fault. There's just no one obvious way to fix all the broken markup that's out there. TagSoup picks one approach. HTML 5 picks another. Both will surprise people a lot of the time. At the parser level that can't be helped. However at the document level it can be helped. When the document author takes the care to generate a well-formed document, they are rarely surprised by the resulting tree the parser builds. The tree is explicit in the markup. Explicit markup is more obvious and less surprising than the implicit fill-in both TagSoup and HTML 5 do. Hmm, that brings up another question. Does the HTML 5 fixup algorithm ever change the *tree* for a well-formed (but invalid) document? For instance, if it finds an li element that is a child of a p, what would it do? Either ignoring the <li></li> tags, skipping the li element completely, or filling in a ul element would all change the tree. I suspect it does one of these three things (or something similar like filling in an ol element) but without opening the spec or writing a sample program, I can't tell you which. By contrast with a real XML parser, I can tell you what's going to happen without cracking open the spec. HTML5, TagSoup, and XML parse trees are all deterministic and thus predictable; but only the XML tree is *obvious*. -- ?Elliotte Rusty Harold elharo at metalab.unc.edu Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Received on Wednesday, 7 March 2007 11:04:08 UTC