- From: olivier Thereaux <ot@w3.org>
- Date: Mon, 24 Oct 2005 14:35:39 +0900
- To: QA-dev Dev <public-qa-dev@w3.org>
in http://lists.w3.org/Archives/Public/www-archive/2005Sep/0001 I demo'd a (rough) perl SAX filter to produce the outline of a document. Prior discussions and reading made me think that this could be a good way to create the outline with the 0.8+ version of the markup validator, running S::P::O. However, unless I missed some option or misunderstood the way to use the SAX filter, this method seems to choke very easily on tag soup, which seems to be rather problematic, since the input of the validator is rather seldom even well formed. It even apparently chokes (way too) easily on comments, although this may well be a mistake in how I coded the filter. The content also needs to be transcoded to utf-8 before sending it through the SAX pipe. Could it be that we will have to give up on the idea of using sax filters, since our input is so loose? (I suppose we could use HTML::Parser or subclasses thereof, instead) Or are you aware of ideas or ways to reconcile our input and the strictness of SAX processing? -- olivier
Received on Monday, 24 October 2005 05:35:45 UTC