Hi Noah,
noah_mendelsohn@us.ibm.com <noah_mendelsohn@us.ibm.com> writes:
> This seems to me a slightly odd way of splitting things.
In the implementation of the parser that I used as an example (Xerces-C++),
XML parsing and validation against the schema are handled in separate
places so in effect every 'data' character that is part of a value that
needs validation is traversed twice: first by the XML parser code then by
the validation code. The whole point of this mental exercise was to show
that content validation must be a lot cheaper than structure validation.
> Indeed, the whole
> point of our earlier-referenced XML Screamer work was to make sure you can
> come as close as possible to touching each such character no more than
> once.
That must have been some pretty tight integration of XML parsing and
schema-based validation. For example when you validate, say a float,
as an element value then you have to look for both legal float characters
as well as '<'. If this float is a value of an attribute then you must
watch for '"' instead of '<'. Or maybe there is a better way (I haven't
gone through all the material you sent in your other email). Also I tend
to believe that most existing parsers don't have this architecture.
-boris
--
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding