Re: performance testing of schemas from noah_mendelsohn@us.ibm.com on 2005-12-08 (xmlschema-dev@w3.org from December 2005)

From: <noah_mendelsohn@us.ibm.com>
Date: Thu, 8 Dec 2005 17:32:52 -0500
To: Bryan Rasmussen <brs@itst.dk>
Cc: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
Message-ID: <OFBB37570B.62970ADA-ON852570D1.007B4E0B-852570D1.007C5B52@lotus.com>

Bryan Rasmussen writes:

> I was wondering if anyone has done any comparative performance testing 
of schema validation in various processors. 

No, but I can give you some theoretical answers based both on experience 
and intuition. I think the results are going to depend a lot on the 
particular processor.  Almost all the features you list can be very well 
optimized, but doing so is not always easy.  For example, substitution 
groups can turn into an ordinary choice once you find all the schema 
documents.  For namespaces and xsi:type, it's difficult to avoid some 
backtracking, but there's a lot you can do if you try hard.   The problem 
is that it's often the possibility that you'd use these constructs that 
makes things slow.  So, it's much easier to build a fast parser that 
doesn't know how to do namespaces.  If you have a parser that's namespace 
aware it may be slower even if your particular instance doesn't use them. 
Of course, there's no limit to the goofy ways people might code a 
particular processor, so you really have to test.

Features like include/import/redefine are mostly handled as the schema 
documents are read in.  As Henry said, good processors will be capable of 
caching the result of such composition or compiling the resulting schema. 
In such cases, they shouldn't cost you anything on validations 2-n.

> effect of size of schema

Really tough to say or to benchmark well.  Most of the algorithms are 
inherently independent of the size of the overall schema, but you can lose 
locality when things get big.  If your processor cache suddenly won't hold 
the code or data structures, performance can fall off in ways that are 
hard to predict.  Similarly, in a language like Java, there might be a 
question as to whether a given implementation is doing object creation 
dynamically or statically, whether somehow you're getting extra garbage 
collection (e.g. because you created so many static objects for the schema 
that all the other dynamic stuff you're doing triggers GC more often.) So, 
you'd not only have to test different processors, you'd want to do it on 
lots of different hardware, vary the memory sizes, try different Java 
JITs, fiddle with GC and heapsize parameters, etc.  I wouldn't expect a 
simple stable curve that would apply in a large variety of cases in 
relating performance as a function of schema size or complexity.

As Henry says, compiling or composing the schema documents is in any case 
high overhead and should be considered separately.

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Bryan Rasmussen <brs@itst.dk>
Sent by: xmlschema-dev-request@w3.org
12/08/05 04:41 AM

        To:     "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        performance testing of schemas

Hey
I was wondering if anyone has done any comparative performance testing of
schema validation in various processors. 

Off-hand the metrics that I suppose would be interesting are:
effect of multiple namespaces on performance,
effect of number of includes/imports/redefines
effect of using substitution groups
effect of xsi:type
effect of size of schema
effect of number of constructs - elements/complexTypes 

How much does reuse of types effect performance.

Enumeration lists. 

any of these items under testing would be really good to know.

Received on Thursday, 8 December 2005 22:33:04 UTC