Re: [ANN] XSDBench XML Schema Benchmark 1.0.0 released from Boris Kolpackov on 2006-10-19 (xmlschema-dev@w3.org from October 2006)

From: Boris Kolpackov <boris@codesynthesis.com>
Date: Thu, 19 Oct 2006 21:10:10 +0200
To: noah_mendelsohn@us.ibm.com
Cc: Boris Kolpackov <boris@codesynthesis.com>, Michael Kay <mike@saxonica.com>, xmlschema-dev@w3.org
Message-ID: <20061019191010.GA31837@karelia>

Hi Noah,

noah_mendelsohn@us.ibm.com <noah_mendelsohn@us.ibm.com> writes:

> This seems to me a slightly odd way of splitting things.

In the implementation of the parser that I used as an example (Xerces-C++),
XML parsing and validation against the schema are handled in separate
places so in effect every 'data' character that is part of a value that
needs validation is traversed twice: first by the XML parser code then by
the validation code. The whole point of this mental exercise was to show
that content validation must be a lot cheaper than structure validation.

> Indeed, the whole
> point of our earlier-referenced XML Screamer work was to make sure you can
> come as close as possible to touching each such character no more than
> once.

That must have been some pretty tight integration of XML parsing and
schema-based validation. For example when you validate, say a float,
as an element value then you have to look for both legal float characters
as well as '<'. If this float is a value of an attribute then you must
watch for '"' instead of '<'. Or maybe there is a better way (I haven't
gone through all the material you sent in your other email). Also I tend
to believe that most existing parsers don't have this architecture.

-boris

-- 
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding

Received on Thursday, 19 October 2006 19:17:55 UTC