- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Tue, 6 Feb 2007 13:59:26 +0200
On Feb 6, 2007, at 13:23, Elliotte Harold wrote: > It would probably have to be done in two parts. First make the > document well-formed (possibly with a TagSoup fork). Then run the > stylesheet. The problem with TagSoup is that it treats bogons > (unknown elements as empty). It also doesn't quite follow Web Apps > 1.0's error recovery algorithm. Possibly I could base the initial > step on html5lib instead. My parser[1] doesn't follow the WA10 parsing algorithm, either, *yet*. However, as a tentative Pythonless Java solution, you could use it together with a RELAX NG validator in the pipeline (using the whattf.org schemas[2]) to implement Draconian failure in cases where the error recovery would kick in as per the WA10 parsing algorithm. Basically, the parser would report to a ContentHandler splitter. The splitter would show each SAX event to Jing/oNVDL first. The validator would use DraconianErrorHandler (Jing/oNVDL is fail-fast). Second, each SAX event would be shown to a TrAX TransformerHandler. [1] http://hsivonen.iki.fi/validator-about/htmlparser.jar [2] http://syntax.whattf.org/ -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 6 February 2007 03:59:26 UTC