Re: DOM WG comments on the schema Last Call from David Beech on 2000-05-26 (www-xml-schema-comments@w3.org from April to June 2000)

From: David Beech <dbeech@us.oracle.com>
Date: Thu, 25 May 2000 19:08:09 -0700
To: www-xml-schema-comments@w3.org
Message-ID: <392DDC88.3109115A@us.oracle.com>
Noah_Mendelsohn@lotus.com wrote:

> >> [Lauren Wood wrote:] there is no infoset defined for the schema

Fortunately, there is.  One of the reasons, perhaps the main
reason, that we elected to use XML syntax for XML Schema was
surely so that generic XML technology and tools are applicable
to schema documents.  Hence a well-formed schema document will
have an infoset, and if validated against the Schema for
Schemas it will have the additions of a PSV-infoset.

I have been thinking about this for a while now, because of its
relevance to XML Query or to anyone who needs to extract
information from a schema document.  Humans at least are likely
to have a mental picture of the XML representation of a
schema document when they think about querying it.
And it is very helpful to query "data" documents and "schema"
or "metadata" documents in the same way, using the common
abstract model of the infoset for all of them.  This is
exactly how database schemas are queried via tables and
views, just like data tables and views, and sharing the same
abstract relational model.

The suspense may have been building up to this point, since
I appear to be heading for a fall, having overlooked the fact
that the question was about the infoset for a "schema" and
not for a "schema document".  However, I just wanted to be
clear about schema documents first, and then say that if
multiple schema documents are used to form a "schema", then
the set of their infosets provides the infoset information
for the whole "schema".

[I fear that the fact that a "schema" is not in general to be
identified with a <schema> is already confusing enough to our
readers.  I would be really opposed to that confusion
spreading to having two different kinds of infoset for them,
two different kinds of DOM, two different kinds of Query, etc.
- if it is at all possible to avoid this.  I believe it is.]

My hypothesis is that it should be possible to express the
additional information that is discovered about a schema
during validation, and that shows up in what we currently
describe as a different kind of "component and properties"
model, as further additions to the PSV-infoset for a schema
document. We would then have a PSDV-infoset (i.e. a
Post-SchemaDocument-Validation-infoset), which is an
expanded PSV-infoset when the instance being validated
happens to be a schema document.

In that way, DOM and Query and others don't have to deal with
two different Infoset models, but can just extend gracefully
(I just had to correct a Freudian slip - "gratefully").

This is only a hypothesis because I haven't had time to check
details, and to see what of the component information is local
to a schema document and what requires assembly of a "schema".

This last point is also interesting from another angle -
I recall seeing a comment recently from someone who would
like to have schema documents validated per se before being
pressed into service.  As I understand it, our Structures spec
only validates schema documents (Constraints on Schemas, etc.)
when using them to validate some instance document .  The
difference may be only slight, but we might find one or two
small things if we tried to separate out "schema document
validation" like that.

>
>
> I am  increasingly intrigued by the notion (which I have mentioned
> privately to one or two members of the workgroup) that we should rename our
> schema components "element declaration information item", "complex type
> definition item", etc..  We have gone to great lengths to define the analog
> of infoset for schemas, and it is obvious that there is confusion about
> what we have done.

Maybe I'm too optimistic, but in the light of the above,
might it be possible to avoid having both an infoset
and an analog of it?  e.g. the EII in the Infoset for
an <element> declaration in a schema document already contains
much of the information so could we just add to it rather
than create this other overlapping EDII?  I don't see it as
being much different to work with, and the simplification
of staying within one infoset model would be rewarding.

Regards,

  David
Received on Thursday, 25 May 2000 22:09:34 UTC