Re: XHTML character entity support

On Nov 25, 2009, at 17:10, John Cowan wrote:

> Simon Pieters scripsit:
> 
>> Because of things like attributes on stray <html> tags affecting  
>> attributes on the root element, a streaming parser sometimes either has to  
>> abort, emit non-SAX events or violate HTML5.
> 
> TagSoup never aborts (except on I/O errors) and it would be useless
> if it produced SAX events that didn't conform to XML.  So, as I say,
> it doesn't guarantee adherence to any particular schema.

The Validator.nu HTML Parser can be configured to apply the Infoset Coercion section of HTML5, in which case the SAX events conform to XML.

This is orthogonal to buffering vs. aborting on non-streamable errors, which is also configurable.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 25 November 2009 16:01:43 UTC