Re: XHTML character entity support

Simon Pieters scripsit:

> Because of things like attributes on stray <html> tags affecting  
> attributes on the root element, a streaming parser sometimes either has to  
> abort, emit non-SAX events or violate HTML5.

TagSoup never aborts (except on I/O errors) and it would be useless
if it produced SAX events that didn't conform to XML.  So, as I say,
it doesn't guarantee adherence to any particular schema.

There is also the fourth "option" of going into an infinite loop.
HTML Tidy used to choose this option quite frequently, apparently because
a pair of fix-up rules were applied repeatedly, changing the tree from
A to B to A to B ....  TagSoup's design makes this particular flavor of
bug impossible.  (Of course there have been, and are, other bugs.)

-- 
John Cowan                                cowan@ccil.org
At times of peril or dubitation,          http://www.ccil.org/~cowan
Perform swift circular ambulation,
With loud and high-pitched ululation.

Received on Wednesday, 25 November 2009 15:10:56 UTC