RE: Schema for schemas and XML schema DTD from Fuchs, Matthew on 2000-11-06 (www-xml-schema-comments@w3.org from October to December 2000)

From: Fuchs, Matthew <matthew.fuchs@commerceone.com>
Date: Mon, 6 Nov 2000 11:40:07 -0800
To: "'ht@cogsci.ed.ac.uk'" <ht@cogsci.ed.ac.uk>
Cc: "Michel, Adrian" <adrian.michel@commerceone.com>, "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Message-ID: <4C4A7BE77CE1D311A1D200508BA38C1202F35264@venus.commerceone.com>
Henry,

Your description below indicates an implementation in which all the
syntactic constraints are already implemented either in the validator itself
- i.e., the s-f-s is hardcoded into the implementation (with some parts in
the DTD for choice 2).  Therefore you claim "it doesn't matter where the
defaults come from" because, in all your examples in your exhaustive list of
how to implement a validator, they are implemented "in code" - even in
choice 2.  (Although I must admit I'm not certain what choice 3 below means
- what does "apply s-f-s in the approved way" mean?  Where have we ever
specified such a thing?  And what does "ad lib" mean here?)  

Now, how would one go about building a validator without these constraints
"in code"?  The application model being pressed by, among others, us, is for
the application to work from the infoset, with the assumption that
validation (and decoration of defaulted values) has already taken place.  As
all syntactic constraints have already been applied, all I need to do is to
build a something generally mapping from an infoset for a schema to the
internal model (components), which I can then use to validate an instance
conforming to that schema.  For example, at that level, I don't care whether
attributes are allowed before or after the content model (or mixed in, or
whatever), all I want is the set of nodes corresponding to the definitions.


This leads to the notion of bootstrapping.  Given the above (something which
walks the infoset for a parsed schema and builds a validator for that schema
from it), and given an s-f-s whose well-formed infoset is exactly the same
as its validated infoset (a fixed point), I can generate a validator for
schema documents by parsing that s-f-s and then applying that to any future
schemas that come my way (such as back onto the s-f-s).  That way I have a
conformant - yes, conformant - XSDL processor for which _none_ of the
properties/constraints specifiable in schema itself need be implemented in
code.  I've now significantly reduced my development time to having a
working validator, especially as I can depend on my more robust general
validator-building code rather than its more error-prone, hand-coded s-f-s
special case equivalent.  In addition, and particularly in light of the
changing nature of the spec (we're still in CR), minor syntactic changes
(such as allowing attributes to show up in different places) require
absolutely no recoding - just use the new s-f-s.  And if I was programming
in a language such as Java (or Python) I'd serialize (pickle) the validator
I built from the s-f-s - with the side-effect that I've build a schema
compiler for the same effort I was going to put into one-off hand coding.
Note that this approach cannot be done with the current s-f-s because it
cannot be read unless the rules are either already written in code or
specified in the DTD.  Therefore it is not standalone.

As I've explained it, this technique is not only faster to develop, it is
also more robust, more flexible in the face of potential language changes,
and if you serialize, perhaps just as fast - at least until language
stability can justify the expense of more error-prone hand coding.  I hope
I'm preaching to the choir, because I can't imagine seriously starting an
implementation any other way.  But that all depends on having an s-f-s for
which all values are specified - or I am forced to continue using the DTD,
and I'm not bootstrapping from the s-f-s itself.  So, while it doesn't
matter where constraints might be specified, it sure matters where they are
applied.  Or is this all just not an "approved" usage of s-f-s?

Now that we are in CR, there is always the possibility that implementation
feedback will lead us to change aspects of language syntax.  For any such
decision, there must result a change in:
1) the language of the spec
2) the DTD
3) the s-f-s
4) validation code for anyone trying to build a validator that doesn't use
the DTD, for the reasons specified above

At this point in the game, the major value of a s-f-s should be to eliminate
2 and minimize, to the degree possible, 1 and 4.  That would be good
software engineering practice.

Matthew

> -----Original Message-----
> From: ht@cogsci.ed.ac.uk [mailto:ht@cogsci.ed.ac.uk]
> Sent: Friday, November 03, 2000 12:26 AM
> To: Fuchs, Matthew
> Cc: Michel, Adrian; 'www-xml-schema-comments@w3.org'
> Subject: Re: Schema for schemas and XML schema DTD
> 
> 
> "Fuchs, Matthew" <matthew.fuchs@commerceone.com> writes:
> 
> > When Adrian first brought this issue to me I assumed this 
> was an oversight,
> > as a Schema for Schemas which can stand independently of a 
> DTD is clearly
> > more useful going forward than one which can't (not that 
> the DTD is not
> > useful).  The additional effort to do this is minor - just 
> filling in
> > default and fixed values.  Then I can truly bootstrap from 
> the Schema for
> > Schemas - I read it in as a schema, and then use it to 
> validate itself,
> > without reference to anything external.  None of this makes 
> it impossible to
> > use the DTD - just makes it optional. 
> 
> The DTD _is_ optional.  Nothing about the language defined by the
> schema for schemas changes for processors which don't access the
> external subset.
> 
> > comments below
> 
> Ditto.
> 
> <snip/>
> 
> > But the Schema for Schemas _is_ normative with respect to 
> what is or is not
> > a conforming schema document and is itself a schema 
> document.  Therefore it
> > should apply to itself - but it cannot do so without access 
> to non-normative
> > parts of the spec, because if you were to process the 
> schema for schemas in
> > standalone - i.e., without the DTD, it would not express XSDL.
> 
> Yes it would.  All the defaults in the DTD just re-interate defaults
> already expressed in the prose (or the schema for schemas).  
> There are 
> three ways a conforming processor can be built:
> 
>   1) Read schemas minimally (that is, w/o processing external
>      subsets), enforce all s-for-s and prose SRC and COS constraints
>      in code;
>   2) Read schemas maximally (that is, with processing of external
>      subset) and/or apply the DTD for schemas as well, enforce all
>      non-DTD expressed s-for-s constraints, and all SRC and COS
>      constraints in code;
>   3) Read schemas ad lib., then apply s-for-s in the approved way,
>      then enforce all SRC and COS constraints in code.
> 
> I claim which of these three is adopted is completely up to the
> implementor, it's clear that they can all give the correct results if
> implemented correctly.
> 
> The crucial point for the current discussion is that it doesn't
> _matter_ where the defaults come from.  The only default 
> values in the 
> DTD for schemas are also in the schema for schemas and the prose of
> the REC.
>   
> 
> > Therefore the non-normative DTD is not merely useful, it is
> > _required_.  Something normative should not _require_ the presence
> > of something else which is not normative.  _Allowing_ the presence
> > of something non-normative is a different issue.
> 
> I hope I've shown above that it's _not_ required.  E.g.
> 
>   <element name="schema" id="schema"> [quoted from the s-for-s]
> 
> has abstract='false' by default, _not_ (just) because such a 
> default is
> in the DTD for schemas, but because it's in the s-for-s, and in the
> prose of the REC, that that's the default.
> 
> 
> > As you well know, XML 1.0 specifies two levels of 
> conformance - well-formed
> > and validating.  The Schema for schemas currently requires 
> a _validating_
> > processor.
> 
> Not so.
> 
> > Making the changes Adrian suggests - just putting in default and
> > fixed values - would lower the bar to implementation to a 
> well-formed
> > processor in standalone mode.
> 
> But you _can't_ do that, or such processors won't work correctly with
> other schemas -- the defaulting _must_ be implemented somewhere!
> 
> > Requiring a DTD breaks all of these principles for the 
> Schema for Schemas:
> > 1) the language is not expressed in instance syntax, but 
> requires a DTD
> 
> False.
> 
> > 2) it is not self-describing as written - it requires a 
> DTD, which is not
> > part of XSDL.
> 
> False.
> 
> > 3) not useable by processors which don't want to validate
> 
> False.
> 
> > 4) is not straighforwardly useable, as it requires a 
> separate document (the
> > DTD)
> 
> False.
> 
> > 5) ditto
> 
> False.
> 
> ht
> -- 
>   Henry S. Thompson, HCRC Language Technology Group, 
> University of Edinburgh
>           W3C Fellow 1999--2001, part-time member of W3C Team
>      2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 
> 131 650-4440
> 	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
> 		     URL: http://www.ltg.ed.ac.uk/~ht/
>
Received on Monday, 6 November 2000 14:40:37 UTC