Re: Looking for help. Implementing a XML validation engine in JavaScript. from Michael Kay on 2010-08-10 (www-dom@w3.org from July to September 2010)

From: Michael Kay <mike@saxonica.com>
Date: Tue, 10 Aug 2010 10:23:12 +0100
To: "Cheney, Edward A SSG RES USAR USARC" <austin.cheney@us.army.mil>
CC: Casey Jordan <casey.jordan@jorsek.com>, xmlschema-dev@w3.org, www-dom@w3.org
Message-ID: <4C611A80.7090209@saxonica.com>

On 10/08/2010 02:53, Cheney, Edward A SSG RES USAR USARC wrote:
> Casey,
>
> I have attempted to write a Lint engine for HTML to imply conformance to an XHTML type syntax with rigid structure requirements to throw errors for violations that conflict with either semantics or accessibility.  The problems I ran into are:
>
> 1)  JavaScript is too slow for any sort of validation engine where structure is stressed as well as syntax.  JavaScript is a high level interpreted language, and so it about as slow as it gets with regards to programming.
>    

I would have said this a couple of years ago, but for client-side 
processing, I don't think this is true nowadays. The typical desktop 
machine (not mobile devices, perhaps) has enough processor cycles going 
spare to tolerate a great deal of inefficiency. Delays from the server 
and the network tend to overshadow any delays from client-side 
processing. Also, I think Javascript engines are getting steadily better.

And it does seem that if you want to implement stuff in the browser, 
writing it in Javascript is often the only choice (writing in XSLT 1.0 
is sometimes an alternative, but probably not here!). So writing an XSD 
validator in Javascript certainly doesn't seem an unreasonable idea.

I'm not sure what you mean by "structure is stressed as well as syntax". 
Validating the structure of an instance document using an FSA is 
actually extremely fast. What takes time is building the FSA (so it 
would be a good idea to try and avoid doing that every time). The other 
significant cost in validation is validating non-string fields against 
simple type definitions - for example, dates and times - the code for 
that is thoroughly uninteresting, but because it's looking at each 
character, the cost can be quite high.

> 2) Logically, the natural inclination of JavaScript towards Lambda closures in theory should be helpful enough to solve the prior problem, but that is not so in practice.  This is line of thought is further enforced by the mere fact that XML namespace inheritance appears to likewise follow a Lambda model.  Unfortunately, this is not a benefit enough to over come the processing inefficiencies described in the prior point.  Additionally, Lambda models exist because they are helpful, however that do appear to directly represent strong potential towards the covert channels better described in systems architectures.  The covert channels are not so much a concern here for security violations, but rather for logic collision with regard to reuse of logic components where closure is manifest in that reuse, but scoped above the parts being reused.
>
>    
This sounds fascinating, but I have absolutely no idea what you are 
talking about. "Namespace inheritance follows a lambda model"? You seem 
to be seeing connections that I've never seen. Could you elucidate? And 
I don't understand your references to covert channels either. You seem 
to be packing an argument that deserves 10 pages into one paragraph, and 
it's left me completely lost.

Michael Kay
Saxonica

Received on Tuesday, 10 August 2010 09:23:46 UTC