W3C home > Mailing lists > Public > public-qa-dev@w3.org > October 2005

Re: using sax filters within the markup validator

From: olivier Thereaux <ot@w3.org>
Date: Tue, 25 Oct 2005 11:07:16 +0900
Message-Id: <1DB0FD77-6598-4C38-9620-DD8322B2C83C@w3.org>
Cc: QA-dev Dev <public-qa-dev@w3.org>
To: Nick Kew <nick@webthing.com>

Hi Nick, Thanks for your notes.

On 24 Oct 2005, at 18:53, Nick Kew wrote:
> On Monday 24 October 2005 06:35, you wrote:
>> in http://lists.w3.org/Archives/Public/www-archive/2005Sep/0001
> Hmmm, I don't recollect that.

I think I only mentioned it on IRC one day, I had just done it to get  
familiar with making SAX filters.

> Hmmm.  OpenSP is a SAX parser; libxml2 provides a SAX filter used in
> many of my tools (including AccessValet).  Both work fairly well to
> generate document outlines.  Or am I missing something?

Most likely I am the one missing something. But I realize I should  
probably have given more details in my previous mail, sorry about  
that. Here goes:

The current development state of check uses SGML::Parser::OpenSP  
instead of onsgmls, and as a result some of the features (including  
"raw errors display", outline and parse tree [1]) are gone.

[1] http://qa-dev.w3.org/wmvs/HEAD/check?uri=http%3A%2F%2Fwww.w3.org% 

Unless I am mistaken, it was mentioned in prior discussions that  
these could be re-enabled by making and using a few SAX filters. (see  
[2] http://esw.w3.org/topic/SoftwareProjects

I tried that, and on documents with wellformedness issues, my quick- 
and-dirty SAX filter-writer choked and gave up. Hence my questions.

> If you tried it with pure-XML SAX then of course it'll fall over on  
> most of
> the web.  I find libxml2's HTMLparser the easiest to use for HTML.   
> Except
> in the context of _validating_ SGML/HTML, where of course OpenSP is  
> the
> only show in town.

I guess this is where the answer to my questions lie. Do you mean  
that the SAX filters we would need to create a view of the outline or  
parse tree would not take as a source the actual document, but rather  
an even sequence from SPO, which would, unlike the source document,  
be wekk formed?

Thank you,
Received on Tuesday, 25 October 2005 02:07:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:36:25 UTC