html/xml parser for the css validator from olivier Thereaux on 2006-12-12 (public-qa-dev@w3.org from December 2006)

From: olivier Thereaux <ot@w3.org>
Date: Tue, 12 Dec 2006 22:48:08 +0900
To: QA Dev <public-qa-dev@w3.org>
Cc: Philippe Le Hegaret <plh@w3.org>
Message-Id: <D89603E6-61D5-40A8-AE74-4B8981A98000@w3.org>

I recall Philippe telling me recently that the parser used in the CSS  
validator was probably not really bleeding edge any more. And I  
suppose it's the culprit in our utf8-and-bom issue.

I haven't looked at it in details, but it seems to be a fairly  
standard SAX parser in its interface, so we could replace it with  
something like http://mercury.ccil.org/~cowan/XML/tagsoup/

Rationale for using TagSoup rather than a strict parser is that it  
will be more useful to parse documents that are not quite good  
markup. Right now the CSS validator just trips on those, and although  
that's kind of cool for the advanced users who want the xml wf- 
checked that opensp (and thus the markup validator) doesn't provide,  
it's just bad usability for a tool that just needs to get what's in  
<style/> and <link/> anyway.

Just food for thought. I don't plan to tackle this just now. I'll add  
an item in bugzilla in case someone feels like taking it on.

-- 
olivier

Received on Tuesday, 12 December 2006 13:48:31 UTC