Re: MTValidate plugin and the whether and how of XML parser

On Mon, 22 Jan 2007 16:20:55 +0900
olivier Thereaux <ot@w3.org> wrote:

> http://golem.ph.utexas.edu/~distler/blog/archives/001054.html

Took a quick look but didn't follow it through.  This looks more
suitable for peoples private installations than a busy public
service to me.

> * All XML validating parsers I know (SAX ones at least) seem to
> die() miserably on well-formedness errors.

By default, yes (that's what the XML spec tells them to do).
But this behaviour can be overridden using a parser flag with
at least some such parsers: Valet uses Xerces this way, and
I believe it works fine with libxml2 these days (though libxml2
didn't support validation when I wrote mod_validator).

>	 While this is good for
> most purposes, it does go against the goal of usability set by the
> markup validator. Also, no xml parser has the collection of error
> message explanations that we have for opensp, or localized messages,
> etc.

Would be nice if the explanations were sufficiently abstracted
to hook in to AN Other parser's error messages.

> * while fatal errors on XML well-formedness errors are maybe OK for  
> "real" XML applications, they're a bit harsh for the gray area that  
> is XHTML, especially when served as text/html.

See above.

> One solution based on pre-parsing and finding document type
> * OpenSP as sole parser for HTML <= 4.01
> * OpenSP as parser, plus XML::LibXML as wf-check for XHTML1
> * XML::LibXML or XML::LibXML::RelaxNG for SVG, MathML, etc

Erk!  That's getting ever heavier!

> Another solution based on mime types alone
> * text/html -> OpenSP
> * application/xhtml+xml -> XML::LibXML,
>    then openSP if wellformed checked passed

Ugh.  Two parsers when it's declared xhtml!

> * others -> XML::LibXML or XML::LibXML::RelaxNG
> 
> Or a mix of the above two
> * text/html -> OpenSP
>    + XML::LibXML as wf-check for XHTML1 mime types
> * application/xhtml+xml -> XML::LibXML wellformed check,
>    + then OpenSP for userfriendly messages
> * others -> XML::LibXML or XML::LibXML::RelaxNG
> 
> 
> Your thoughts? Nick, I know you've been using opensp xor xerces for
> a while, any opinion on the validity of combining them?

I prefer what Valet does now: it dispatches to one parser or the
other based on sniffing rules, but enables manual override of
parser choice where that could make sense.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Received on Monday, 22 January 2007 14:01:41 UTC