Re: quick notes on XHTML appendix C checking directly as SAX events in markup validator

Continuing to document my findings as they happen. Hope this is not to  
verbose.

On Dec 28, 2007, at 17:46 , olivier Thereaux wrote:
> * we parse XML-WF with XML::LibXML so could we use its sax parser  
> instead
> for XML::LibXML we need
>  $xmlparser->line_numbers(1);
>  $xmlparser->validation(0);
>  $xmlparser->load_ext_dtd(0);
> but
> Can't locate object method "line_numbers" via package  
> "XML::LibXML::SAX::Parser"
> (etc)
> ... weird, I'd expect XML::LibXML::SAX::Parser to know the same  
> methods as XML::LibXML

First of all, it looks like XML::LibXML::SAX::Parser is obsolete and  
that XML::LibXML::SAX is to be used. The latter has a terse  
documentation, mostly because it just implements the perl SAX2 API.

XML::LibXML::SAX does not have the line_numbers(1) etc methods,  
because (as the source showed) it does not really extend XML::LibXML  
but rather XML::SAX::Base so the options should be set through SAX  
features.

http://perl-xml.sourceforge.net/perl-sax/sax-2.1-adv.html#Features

That seems to be an issue if we want the best of both worlds, that is,  
the relatively complete context on well-formedness errors, and SAX  
events.

Another issue: The feature 'http://xml.org/sax/properties/xml-string'  
is not supported by XML::LibXML::SAX
In other words, one feature which I think would be necessary to do  
regexp matching on some specific xml strings (e.g to test for space in  
empty elements) is not present. This appears to kill any hope of using  
XML::LibXML::SAX for HTML compatibility checking.

What other option? use XML::Parser; just like the code which Bjoern  
wrote did. However, XML::Parser has a lesser reporting capability for  
xml-well-formedness error (it seems it stops at the first WF error)  
and would not be usable for XML Schema or RNG validation (unlike  
XML::LibXML). That said, tools to validate against these types of  
schemas seem to be much better in java than perl at this point in  
time, so we may have to use a different toolset anyway...

-- 
olivier

Received on Wednesday, 2 January 2008 06:23:32 UTC