- From: Simon Pieters <simonp@opera.com>
- Date: Wed, 25 Nov 2009 13:37:00 +0100
- To: "John Cowan" <cowan@ccil.org>, "Ian Hickson" <ian@hixie.ch>
- Cc: www-archive@w3.org
On Tue, 24 Nov 2009 21:14:54 +0100, John Cowan <cowan@ccil.org> wrote: > Ian Hickson scripsit: > >> TagSoup is could be made more compatible with existing deployed content, >> then. It might be compatible enough for most purposes already, but there >> are pages on the Web that depend on the <head> element being always >> present. Also, the <ul> element should certainly not be implied. > > TagSoup is not intended for deployment in browsers. Rather, it generates > SAX events based on HTML input, permitting fairly arbitrary HTML to > be processed using XML tools such as XSLT. It guarantees, therefore, > that the output is well-formed XML (except for encoding issues) rather > than that it conforms to any specific schema. If you don't like what > TagSoup outputs, you can always transform the output further until the > result is more like what you expect. > > In particular, there are absolutely no guarantees that CSS paths or > JavaScript DOM references that work on the HTML will continue to work > on the XML; they probably won't. > > In principle it would be possible to use an implementation of the HTML5 > algorithm to construct a DOM and then use a simple DOM walker to read > out SAX events, but this would be much more heavyweight in time and > space than TagSoup is, so I imagine it will continue to be used. The Validator.nu HTML parser can be run in SAX streaming mode which doesn't construct a DOM in between. Because of things like attributes on stray <html> tags affecting attributes on the root element, a streaming parser sometimes either has to abort, emit non-SAX events or violate HTML5. >> Also, on another note, TagSoup is not compliant with HTML4 if it doesn't >> output a HEAD element without an explicit <HEAD> tag, since <HEAD> is an >> optional tag in HTML4. :-) > > True; see above. > -- Simon Pieters Opera Software
Received on Wednesday, 25 November 2009 12:37:58 UTC