- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 8 Dec 2005 17:32:52 -0500
- To: Bryan Rasmussen <brs@itst.dk>
- Cc: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
Bryan Rasmussen writes: > I was wondering if anyone has done any comparative performance testing of schema validation in various processors. No, but I can give you some theoretical answers based both on experience and intuition. I think the results are going to depend a lot on the particular processor. Almost all the features you list can be very well optimized, but doing so is not always easy. For example, substitution groups can turn into an ordinary choice once you find all the schema documents. For namespaces and xsi:type, it's difficult to avoid some backtracking, but there's a lot you can do if you try hard. The problem is that it's often the possibility that you'd use these constructs that makes things slow. So, it's much easier to build a fast parser that doesn't know how to do namespaces. If you have a parser that's namespace aware it may be slower even if your particular instance doesn't use them. Of course, there's no limit to the goofy ways people might code a particular processor, so you really have to test. Features like include/import/redefine are mostly handled as the schema documents are read in. As Henry said, good processors will be capable of caching the result of such composition or compiling the resulting schema. In such cases, they shouldn't cost you anything on validations 2-n. > effect of size of schema Really tough to say or to benchmark well. Most of the algorithms are inherently independent of the size of the overall schema, but you can lose locality when things get big. If your processor cache suddenly won't hold the code or data structures, performance can fall off in ways that are hard to predict. Similarly, in a language like Java, there might be a question as to whether a given implementation is doing object creation dynamically or statically, whether somehow you're getting extra garbage collection (e.g. because you created so many static objects for the schema that all the other dynamic stuff you're doing triggers GC more often.) So, you'd not only have to test different processors, you'd want to do it on lots of different hardware, vary the memory sizes, try different Java JITs, fiddle with GC and heapsize parameters, etc. I wouldn't expect a simple stable curve that would apply in a large variety of cases in relating performance as a function of schema size or complexity. As Henry says, compiling or composing the schema documents is in any case high overhead and should be considered separately. Noah -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- Bryan Rasmussen <brs@itst.dk> Sent by: xmlschema-dev-request@w3.org 12/08/05 04:41 AM To: "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: performance testing of schemas Hey I was wondering if anyone has done any comparative performance testing of schema validation in various processors. Off-hand the metrics that I suppose would be interesting are: effect of multiple namespaces on performance, effect of number of includes/imports/redefines effect of using substitution groups effect of xsi:type effect of size of schema effect of number of constructs - elements/complexTypes How much does reuse of types effect performance. Enumeration lists. any of these items under testing would be really good to know.
Received on Thursday, 8 December 2005 22:33:04 UTC