- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 8 Dec 2005 17:32:52 -0500
- To: Bryan Rasmussen <brs@itst.dk>
- Cc: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
Bryan Rasmussen writes:
> I was wondering if anyone has done any comparative performance testing
of schema validation in various processors.
No, but I can give you some theoretical answers based both on experience
and intuition. I think the results are going to depend a lot on the
particular processor. Almost all the features you list can be very well
optimized, but doing so is not always easy. For example, substitution
groups can turn into an ordinary choice once you find all the schema
documents. For namespaces and xsi:type, it's difficult to avoid some
backtracking, but there's a lot you can do if you try hard. The problem
is that it's often the possibility that you'd use these constructs that
makes things slow. So, it's much easier to build a fast parser that
doesn't know how to do namespaces. If you have a parser that's namespace
aware it may be slower even if your particular instance doesn't use them.
Of course, there's no limit to the goofy ways people might code a
particular processor, so you really have to test.
Features like include/import/redefine are mostly handled as the schema
documents are read in. As Henry said, good processors will be capable of
caching the result of such composition or compiling the resulting schema.
In such cases, they shouldn't cost you anything on validations 2-n.
> effect of size of schema
Really tough to say or to benchmark well. Most of the algorithms are
inherently independent of the size of the overall schema, but you can lose
locality when things get big. If your processor cache suddenly won't hold
the code or data structures, performance can fall off in ways that are
hard to predict. Similarly, in a language like Java, there might be a
question as to whether a given implementation is doing object creation
dynamically or statically, whether somehow you're getting extra garbage
collection (e.g. because you created so many static objects for the schema
that all the other dynamic stuff you're doing triggers GC more often.) So,
you'd not only have to test different processors, you'd want to do it on
lots of different hardware, vary the memory sizes, try different Java
JITs, fiddle with GC and heapsize parameters, etc. I wouldn't expect a
simple stable curve that would apply in a large variety of cases in
relating performance as a function of schema size or complexity.
As Henry says, compiling or composing the schema documents is in any case
high overhead and should be considered separately.
Noah
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Bryan Rasmussen <brs@itst.dk>
Sent by: xmlschema-dev-request@w3.org
12/08/05 04:41 AM
To: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
cc: (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: performance testing of schemas
Hey
I was wondering if anyone has done any comparative performance testing of
schema validation in various processors.
Off-hand the metrics that I suppose would be interesting are:
effect of multiple namespaces on performance,
effect of number of includes/imports/redefines
effect of using substitution groups
effect of xsi:type
effect of size of schema
effect of number of constructs - elements/complexTypes
How much does reuse of types effect performance.
Enumeration lists.
any of these items under testing would be really good to know.
Received on Thursday, 8 December 2005 22:33:04 UTC