- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 8 Dec 2005 12:08:05 -0500
- To: "Michael Kay" <mike@saxonica.com>
- Cc: "'Bryan Rasmussen'" <brs@itst.dk>, "'Henry S. Thompson'" <ht@inf.ed.ac.uk>, xmlschema-dev@w3.org
- Message-ID: <OFED8D0103.F554A008-ON852570D1.005D5005-852570D1.005E9F21@lotus.com>
Michael Kay asks:
> Is it conformant to use a schema for validating instances
> without reporting the errors that appear in unused parts of the schema?
First, let's separate schema documents from schemas. The Rec has nothing
to say about schema documents that look almost like schema docs but have
one or more errors, just as the XML Rec has little so say about documents
with missing end tags, except to say that they are not XML. From that
point of view, if you aren't checking, you're on your own.
The schema Rec also makes clear that you can be minimally conforming by
putting together a schema any way you like. The pertinent Rec text is
(note that this is talking about components in general and notspecifically
about schema documents) [1]:
Processors have the option to assemble (and perhaps to optimize or
pre-compile) the entire schema prior to the start of an ·assessment·
episode, or to gather the schema lazily as individual components are
required. In all cases it is required that:
The processor succeed in locating the ·schema components· transitively
required to complete an ·assessment· (note that components derived from
·schema documents· can be integrated with components obtained through
other means);
no definition or declaration changes once it has been established;
if the processor chooses to acquire declarations and definitions
dynamically, that there be no side effects of such dynamic acquisition
that would cause the results of ·assessment· to differ from that which
would have been obtained from the same schema components acquired in bulk.
Note: the ·assessment· core is defined in terms of schema components at
the abstract level, and no mention is made of the schema definition syntax
(i.e. <schema>). Although many processors will acquire schemas in this
format, others may operate on compiled representations, on a programmatic
representation as exposed in some programming language, etc.
The obligation of a schema-aware processor as far as the ·assessment· core
is concerned is to implement one or more of the options for ·assessment·
given below in Assessing Schema-Validity (§5.2). Neither the choice of
element information item for that ·assessment·, nor which of the means of
initiating ·assessment· are used, is within the scope of this
specification.
Although ·assessment· is defined recursively, it is also intended to be
implementable in streaming processors. Such processors may choose to
incrementally assemble the schema during processing in response, for
example, to encountering new namespaces. The implication of the invariants
expressed above is that such incremental assembly must result in an
·assessment· outcome that is the same as would be given if ·assessment·
was undertaken again with the final, fully assembled schema.
The way to think about this is: at the end of your validation episode,
you must have used some components. Whatever they are, taken together:
* They must meet the constraints on components, I.e. the must be a legal
schema. Nothing says that they need to be the same schema you would have
used for validating some other instance.
* None of the components may have changed during validation, in the
following sense: if you were to take that final schema that you knew
about at the end, and revalidate the same instance, you must get the same
PSVI. So, the results must be the same as if you had not streamed and as
if the component properties had been established from the start.
That's the story for minimal conformance [2]. If you additionally wish to
claim "conformance to the XML Representation of Schemas" [2] then you must
indeed read your schema documents for correctness and for proper mappings
to components. There has been some disagreement in the WG as to whether
you need to check for errors in mappings of components you don't use, but
surely you need to check the constraints on XML representations.
In any case, my reading is that you can surely do what you want and claim
minimal conformance. Whether you can skip certain error checking on
"unused" components from schema documents and claim "conformance to the
XML Representation of Schemas" is a bit less clear, but I don't see why
that should be a barrier to implementation. This is my personal reading.
YMMV.
Noah
[1] http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/#layer1
[2]
http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/#concepts-conformance
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
"Michael Kay" <mike@saxonica.com>
Sent by: xmlschema-dev-request@w3.org
12/08/05 11:26 AM
To: "'Henry S. Thompson'" <ht@inf.ed.ac.uk>, "'Bryan
Rasmussen'" <brs@itst.dk>
cc: <xmlschema-dev@w3.org>, (bcc: Noah
Mendelsohn/Cambridge/IBM)
Subject: RE: performance testing of schemas
> My experience is that schema 'compilation' completely swamps
> schema-based
> 'validation', so the first thing to do in any performance testing is
> to separate these two phases.
With this in mind I've been thinking about "lazy" or incremental
compilation
of schemas, to avoid compiling the parts that aren't used in a particular
validation episode.
Is it conformant to use a schema for validating instances without
reporting
the errors that appear in unused parts of the schema?
Michael Kay
http://www.saxonica.com/
Received on Thursday, 8 December 2005 17:08:40 UTC