- From: Elliotte Harold <elharo@metalab.unc.edu>
- Date: Sat, 02 Dec 2006 07:02:04 -0500
Lachlan Hunt wrote: > HTML and XML have significantly different parsing requirements and they > absolutely must be treated as significantly different file formats. Any > attempt to treat them as the same format is an extremely bad idea. That's only true to the extent that some people seem to insist on making them needlessly different. HTML is tantalizingly close to well-formed XML. They both derive from SGML. They both use angle bracketed tags. They both define a tree structure. Indeed in many cases an HTML document is an XML document. This enables the use of the very powerful XML toolchain for processing HTML. In fact, prior to the widespread adoption of XML there were, near as I could tell, no reliable open means of parsing HTML documents. There were a few proprietary, incompatible, buggy engines locked up in various browsers; and that was about it. What I don't understand is why some members of this working group is so dead set on actively preventing HTML from being XML. The non-draconian error handling I understand. But why are you disappointed that <!DOCTYPE html> is well-formed XML? Why the active hostility to well-formedness? -- ?Elliotte Rusty Harold elharo at metalab.unc.edu Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Received on Saturday, 2 December 2006 04:02:04 UTC