RE: performance testing of schemas from noah_mendelsohn@us.ibm.com on 2005-12-08 (xmlschema-dev@w3.org from December 2005)

From: <noah_mendelsohn@us.ibm.com>
Date: Thu, 8 Dec 2005 12:08:05 -0500
To: "Michael Kay" <mike@saxonica.com>
Cc: "'Bryan Rasmussen'" <brs@itst.dk>, "'Henry S. Thompson'" <ht@inf.ed.ac.uk>, xmlschema-dev@w3.org
Message-ID: <OFED8D0103.F554A008-ON852570D1.005D5005-852570D1.005E9F21@lotus.com>
Michael Kay asks:

> Is it conformant to use a schema for validating instances 
> without reporting the errors that appear in unused parts of the schema?

First, let's separate schema documents from schemas.   The Rec has nothing 
to say about schema documents that look almost like schema docs but have 
one or more errors, just as the XML Rec has little so say about documents 
with missing end tags, except to say that they are not XML.  From that 
point of view, if you aren't checking, you're on your own.

The schema Rec also makes clear that you can be minimally conforming by 
putting together a schema any way you like.  The pertinent Rec text is 
(note that this is talking about components in general and notspecifically 
about schema documents) [1]:

Processors have the option to assemble (and perhaps to optimize or 
pre-compile) the entire schema prior to the start of an ·assessment· 
episode, or to gather the schema lazily as individual components are 
required. In all cases it is required that:
The processor succeed in locating the ·schema components· transitively 
required to complete an ·assessment· (note that components derived from 
·schema documents· can be integrated with components obtained through 
other means);
no definition or declaration changes once it has been established;
if the processor chooses to acquire declarations and definitions 
dynamically, that there be no side effects of such dynamic acquisition 
that would cause the results of ·assessment· to differ from that which 
would have been obtained from the same schema components acquired in bulk.
Note:  the ·assessment· core is defined in terms of schema components at 
the abstract level, and no mention is made of the schema definition syntax 
(i.e. <schema>). Although many processors will acquire schemas in this 
format, others may operate on compiled representations, on a programmatic 
representation as exposed in some programming language, etc. 
The obligation of a schema-aware processor as far as the ·assessment· core 
is concerned is to implement one or more of the options for ·assessment· 
given below in Assessing Schema-Validity (§5.2). Neither the choice of 
element information item for that ·assessment·, nor which of the means of 
initiating ·assessment· are used, is within the scope of this 
specification.
Although ·assessment· is defined recursively, it is also intended to be 
implementable in streaming processors. Such processors may choose to 
incrementally assemble the schema during processing in response, for 
example, to encountering new namespaces. The implication of the invariants 
expressed above is that such incremental assembly must result in an 
·assessment· outcome that is the same as would be given if ·assessment· 
was undertaken again with the final, fully assembled schema. 

The way to think about this is:  at the end of your validation episode, 
you must have used some components.  Whatever they are, taken together:

* They must meet the constraints on components, I.e. the must be a legal 
schema.  Nothing says that they need to be the same schema you would have 
used for validating some other instance.

* None of the components may have changed during validation, in the 
following sense:  if you were to take that final schema that you knew 
about at the end, and revalidate the same instance, you must get the same 
PSVI.  So, the results must be the same as if you had not streamed and as 
if the component properties had been established from the start.

That's the story for minimal conformance [2].  If you additionally wish to 
claim "conformance to the XML Representation of Schemas" [2] then you must 
indeed read your schema documents for correctness and for proper mappings 
to components.  There has been some disagreement in the WG as to whether 
you need to check for errors in mappings of components you don't use, but 
surely you need to check the constraints on XML representations.

In any case, my reading is that you can surely do what you want and claim 
minimal conformance.  Whether you can skip certain error checking on 
"unused" components from schema documents and claim "conformance to the 
XML Representation of Schemas" is a bit less clear, but I don't see why 
that should be a barrier to implementation.  This is my personal reading. 
YMMV.

Noah

[1] http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/#layer1
[2] 
http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/#concepts-conformance

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








"Michael Kay" <mike@saxonica.com>
Sent by: xmlschema-dev-request@w3.org
12/08/05 11:26 AM
 
        To:     "'Henry S. Thompson'" <ht@inf.ed.ac.uk>, "'Bryan 
Rasmussen'" <brs@itst.dk>
        cc:     <xmlschema-dev@w3.org>, (bcc: Noah 
Mendelsohn/Cambridge/IBM)
        Subject:        RE: performance testing of schemas



> My experience is that schema 'compilation' completely swamps 
> schema-based
> 'validation', so the first thing to do in any performance testing is
> to separate these two phases.

With this in mind I've been thinking about "lazy" or incremental 
compilation
of schemas, to avoid compiling the parts that aren't used in a particular
validation episode.

Is it conformant to use a schema for validating instances without 
reporting
the errors that appear in unused parts of the schema?

Michael Kay
http://www.saxonica.com/
Received on Thursday, 8 December 2005 17:08:40 UTC