W3C home > Mailing lists > Public > www-validator@w3.org > January 2007

MTValidate plugin and the whether and how of XML parser

From: olivier Thereaux <ot@w3.org>
Date: Mon, 22 Jan 2007 16:20:55 +0900
Message-Id: <6C5C4645-38A8-4E51-BAC0-D98C2DDD5451@w3.org>
Cc: QA Dev <public-qa-dev@w3.org>
To: www-validator Community <www-validator@w3.org>

A short while ago I came across a blog post by Jacques Distler,  
explaining how he was adding XML wellformedness checking to his  
validation script (based on the W3C markup validator). Not replacing  
opensp with an xml parser, combining the two.


Jacques's idea is very easy to adapt to the code of the markup  

I find that interesting, because:

* OpenSP is great software, but its XML support has been lacking for  
a long time, and I don't know if it's even on its radar to improve  
the situation. Others, closer to the opensp project, may know. Terje?

* All XML validating parsers I know (SAX ones at least) seem to die()  
miserably on well-formedness errors. While this is good for most  
purposes, it does go against the goal of usability set by the markup  
validator. Also, no xml parser has the collection of error message  
explanations that we have for opensp, or localized messages, etc.

* while fatal errors on XML well-formedness errors are maybe OK for  
"real" XML applications, they're a bit harsh for the gray area that  
is XHTML, especially when served as text/html.

One solution based on pre-parsing and finding document type
* OpenSP as sole parser for HTML <= 4.01
* OpenSP as parser, plus XML::LibXML as wf-check for XHTML1
* XML::LibXML or XML::LibXML::RelaxNG for SVG, MathML, etc

Another solution based on mime types alone
* text/html -> OpenSP
* application/xhtml+xml -> XML::LibXML,
   then openSP if wellformed checked passed
* others -> XML::LibXML or XML::LibXML::RelaxNG

Or a mix of the above two
* text/html -> OpenSP
   + XML::LibXML as wf-check for XHTML1 mime types
* application/xhtml+xml -> XML::LibXML wellformed check,
   + then OpenSP for userfriendly messages
* others -> XML::LibXML or XML::LibXML::RelaxNG

Your thoughts? Nick, I know you've been using opensp xor xerces for a  
while, any opinion on the validity of combining them?

olivier Thereaux - W3C - http://www.w3.org/People/olivier/
W3C Open Source Software: http://www.w3.org/Status
Received on Monday, 22 January 2007 07:22:20 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:50 UTC