- From: John Cowan <cowan@ccil.org>
- Date: Tue, 24 Nov 2009 15:14:54 -0500
- To: Ian Hickson <ian@hixie.ch>
- Cc: John Cowan <cowan@ccil.org>, www-archive@w3.org
Ian Hickson scripsit: > TagSoup is could be made more compatible with existing deployed content, > then. It might be compatible enough for most purposes already, but there > are pages on the Web that depend on the <head> element being always > present. Also, the <ul> element should certainly not be implied. TagSoup is not intended for deployment in browsers. Rather, it generates SAX events based on HTML input, permitting fairly arbitrary HTML to be processed using XML tools such as XSLT. It guarantees, therefore, that the output is well-formed XML (except for encoding issues) rather than that it conforms to any specific schema. If you don't like what TagSoup outputs, you can always transform the output further until the result is more like what you expect. In particular, there are absolutely no guarantees that CSS paths or JavaScript DOM references that work on the HTML will continue to work on the XML; they probably won't. In principle it would be possible to use an implementation of the HTML5 algorithm to construct a DOM and then use a simple DOM walker to read out SAX events, but this would be much more heavyweight in time and space than TagSoup is, so I imagine it will continue to be used. > Also, on another note, TagSoup is not compliant with HTML4 if it doesn't > output a HEAD element without an explicit <HEAD> tag, since <HEAD> is an > optional tag in HTML4. :-) True; see above. -- Híggledy-pìggledy / XML programmers John Cowan Try to escape those / I-eighteen-N woes; http://www.ccil.org/~cowan Incontrovertibly / What we need more of is cowan@ccil.org Unicode weenies and / François Yergeaus.
Received on Tuesday, 24 November 2009 20:15:27 UTC