W3C home > Mailing lists > Public > www-dom@w3.org > July to September 2010

Re: Looking for help. Implementing a XML validation engine in JavaScript.

From: Cheney, Edward A SSG RES USAR USARC <austin.cheney@us.army.mil>
Date: Tue, 10 Aug 2010 16:55:38 -0500
To: Michael Kay <mike@saxonica.com>
Cc: xmlschema-dev@w3.org,www-dom@w3.org
Message-ID: <f721fa7e189f1.4c61848a@us.army.mil>
Michael,

Processing and generating instructions is relative to the length of input.  Processing is fast until it is not.  What such a broad claim eludes to is that many smaller examples of input may appear to be fast enough to generate output against.  At a point when the input becomes large enough generating the output becomes too slow for user acceptance even if output is returned fast for acceptable testing.  At that point once the input grows the length in processing time becomes incrementally noticeable.  I don't know what the math calculation is, but as input grows the length of processing increases.  The rate of that processing increase is not directly proportional to the growth of input leaving something like a logarithmic curve.

When I was attempting to write my HTML lint code last year I was using the latest JavaScript interpreters at the time.  The delay, in my case, was most certainly due to hardware as I was write my code only on a netbook, which is perhaps 5-6 times slower than my current work computer.  Even though the hardware was slow it was still an acceptable test platform given the variety of hardware currently in use.  My primary test example at the time was the Travelocity homepage with only minimal changes necessary for XHTML syntax compatibility.  The sample I was using was about 174K.

As far as what I said about the namespace issue here is an example:

var a = function () {
    var b = 5;
    subfunction1();
    subfunction2();
}

In the above example subfunction1 and subfunction2 are functions not defined in function "a".  Variable "b" is a closure to subfunction1() and subfunction2() by their inclusion into the scope of function "a".  Presume each of those sub functions exit using the "return" keyword with a value and that each of those sub functions do consume closure "b".  Finally, presume each of those sub functions are also executed by inclusion into other pieces of code that you do not see.  In those case you can see those sub functions in their relation to closure "b", so if you know what those functions do and how they consume "b" then you will know what to expect when they return and what to expect if that returned value is used in a statement or other function under "a".  But, what if "b" is a closure, but not under "a" and its scope is not clear in relation to the inclusion of those sub functions?  No big deal, because you will likely still have any idea what to expect from "a" from testing the code and have an idea what to expect from the output.  The prior statement is not entirely accurate, because "b" can be dynamic before it is consumed by the sub functions and if it is so dynamically set and the sub functions are like wise being executed by inclusion in multiple places then "b" can become a link between those pieces of execution.  An unintended link between separated pieces of logic is a covert channel.

The exemplified covert channel can be interrupted by use of a variable "b" at a scope low enough so that it is isolated to only one instance of execution.  This is typically represented as an unintentional condition in a code set of many scope layers using many like named variables, such that a coder in once instance may arbitrarily scope a new variable where a difference variable exists with the same name in a higher scope.

In my previous email I mentioned the possibility of collision.  If variable "b" does exist at a higher scope, is dynamic, and is consumed by interchangeable pieces that are reused in various places the possibility of collision occurs when those pieces of consuming logic return and their consuming functions process that returned data.  Data prepared for one instance of logic, in that case, may be consumed by a separate instance of logic due blurred direction upon the linkage.  This is not a failure of the logic of the language, but is the result of possible oversight by the coder in the absence of controls or regulation upon the multiple instances of code working together in manners not expected.  That is how advertisements can, by accident and not maliciously, halt JavaScript processing upon a page.  Internally to a closed application it can result in unnecessary churn that adds to the logarithmic processing times of increasingly larger code.

Such a covert channel is not necessarily a prominent security risk in JavaScript only because such conditions are difficult to identify in the wild without some sort of web spider that is capable of perform automated flow control analysis of the logic and because there are so many other easier ways to compromise targets on the web with a higher degree of assurance and at a significantly lower cost.  The primary harm in such covert channels that is that instructions can be moved in memory without IO back to the user, so it is hard to determine that errors in logic actually exist.
Received on Tuesday, 10 August 2010 21:56:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 June 2012 06:14:05 GMT