W3C home > Mailing lists > Public > public-qa-dev@w3.org > December 2007

quick notes on XHTML appendix C checking directly as SAX events in markup validator

From: olivier Thereaux <ot@w3.org>
Date: Fri, 28 Dec 2007 17:46:44 +0900
Message-Id: <B554A6D6-D6AC-434D-BD5D-9B583C128A58@w3.org>
To: Tools dev list <public-qa-dev@w3.org>

This is mostly for myself, as a way to remember what I tried today,  
but just in case anyone's interested…

I was looking at how to include the appendix C checks (currently as  
standalone app [1]) but mostly based on SAX events over XML::Parser  
within the markup validator, currently using two parsers: opensp (with  
Bjoern's module as interface to its API [3]) and the perl libxml2  
wrapper (XML::LibXML).

[1] http://dev.w3.org/cvsweb/perl/modules/W3C/XHTML/HTMLCompatChecker/bin/appCcheck.pl
[2] http://dev.w3.org/cvsweb/validator/httpd/cgi-bin/check
[3] http://openjade.sourceforge.net/doc/generic.htm

Today's rummaging...

* opensp API events don't give enough info about matched string to  
check for constructs such as <foo/> versus <foo />
   ... so completely useless for appendix C checking

* we parse XML-WF with XML::LibXML so could we use its sax parser  
instead
for XML::LibXML we need
   $xmlparser->line_numbers(1);
   $xmlparser->validation(0);
   $xmlparser->load_ext_dtd(0);
but
Can't locate object method "line_numbers" via package  
"XML::LibXML::SAX::Parser"
(etc)
... weird, I'd expect XML::LibXML::SAX::Parser to know the same  
methods as XML::LibXML
http://search.cpan.org/dist/XML-LibXML/ has doc for sax parser missing.
pity.

* when commenting out these and running
my $saxhandler = W3C::Validator::SAXHandler->new($File);
my $xmlparser = XML::LibXML::SAX::Parser->new(Handler => $saxhandler);
...
$xmlparser->parse_string($xml_string);
It doesn't seem like any event is passed to the handler

Hmm, I'm probably doing it wrong, or maybe the SAX side of XML::LibXML  
doesn't grok parse_string, only parse_uri?

More on that next week.
-- 
olivier
Received on Friday, 28 December 2007 08:46:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:48 GMT