Random thoughts on modularization of validator(s)

... thoughts to be taken as little more than pure thinking-out-loud. I 
even wrote "mudularization", which seemingly well describes the clarity 
of my ideas on the matter. Do comment, however, or add you ideas if you 
want.


Now for the obvious: all our tools (including the link checker) are 
built around three basic operations: retrieval of content, parsing and 
checking, presentation of results. All of this wrapped in an iterface, 
usually showing access to the first operation and providing some output 
for the last.

Finer details of the architecture of our tools include:
- some pre-parsing of the content (e.g, figuring out doctype and 
charset for markup validator, well-formedness check for CSS validator)
- access to a catalogue of grammars, rules, schemas, etc


Is it fair to introduce the idea of a "global" multi-format validation 
service when discussing the modularization and future arch of the 
Markup Validator?
The fact that introducing requirements for a "generic conformance 
checking tool" into the modularization of one specific tool will slow 
the latter is valid, yet probably just on a mid-term basis.

Why not separate the two discussions, as we have done in the past?
-  The addition of languages, currently quite awkward, would make a 
multi-parser arch necessary for the markup validator anyway.
- We are already redirecting from the Markup validator to the CSS 
validator when necessary
- The CSS validator performs wellformedness/validity checks of (X)HTML 
with embedded style.
- The irony of having a better XML parser in the CSS validator than in 
the Markup validator

I recently ruled out the possibility of using the CSS validator as a 
basis for a generic tool: however tempting that may be, with a tool 
already using multiple parsers, its code is apparently not something we 
really want to base ourselves on.

I am not saying the modularization of the Markup validator should be 
stopped until we figured a grand world domination plan: it is, in any 
case, very unlikely that the efforts towards splitting parsing and 
presentation, etc, will be harmful.

In a rather simplified view of things, I see three main questions :
1- how to tie the blocks together?
2- why only one way to tie the blocks together?
3- should the process of parsing/validation be purely iterative?

1- I am relatively programming-language agnostic, and I see that we are 
likely to want to use different technologies at the parser level. tying 
it all together with SOAP seems more attractive than sticking with one 
API, concerns of performance notwithstanding. But I might be completely 
wrong on this.

2- Would it make sense to have the blocks accessible through different 
kinds of interfaces/protocols? Would it be a waste of time to do that? 
Then there is the question of which one(s) we favour for the main 
service(s)

3-  I see the point of an iterative parsing and validation process, 
such as well-formedness->validity->attributes->appC, yet the trend 
seems to point towards something more complicated [insert vague 
ramblings on dividing and processing parallel validation, using trendy 
acronyms e.g NRL, DSDL - I am yet to investigate these fields 
further.]. Should we aim at doing that too?

all for now. I hope we can discuss this at our meeting today.
	http://esw.w3.org/topic/QaDev


Thanks,
-- 
olivier

Received on Tuesday, 20 July 2004 02:06:50 UTC