Re: Schema for schemas and XML schema DTD from Henry S. Thompson on 2000-11-09 (www-xml-schema-comments@w3.org from October to December 2000)

From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
Date: 09 Nov 2000 08:51:29 +0000
To: "Fuchs, Matthew" <matthew.fuchs@commerceone.com>
Cc: "Michel, Adrian" <adrian.michel@commerceone.com>, "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Message-ID: <f5blmutcyum.fsf@cogsci.ed.ac.uk>
"Fuchs, Matthew" <matthew.fuchs@commerceone.com> writes:

> Your description below indicates an implementation in which all the
> syntactic constraints are already implemented either in the validator itself
> - i.e., the s-f-s is hardcoded into the implementation (with some parts in
> the DTD for choice 2).  Therefore you claim "it doesn't matter where the
> defaults come from" because, in all your examples in your exhaustive list of
> how to implement a validator, they are implemented "in code" - even in
> choice 2.

No, only in choice 1 do the defaults come from the code.  In choice 2
they come from the DTD, and in choice 3 they come from the s-for-s.

> (Although I must admit I'm not certain what choice 3 below means
> - what does "apply s-f-s in the approved way" mean?  Where have we ever
> specified such a thing?  And what does "ad lib" mean here?)  

Sorry I wasn't clear.  By 'ad lib' I simply meant it didn't matter
whether a validating parser was used, or just a
well-formedness-enforcing one.  By 'in the approved way' I simply
meant as defined in the spec., e.g. supplying defaults for the target
schema EIIs/AIIs per the s-for-s.

> Now, how would one go about building a validator without these constraints
> "in code"?  The application model being pressed by, among others, us, is for
> the application to work from the infoset, with the assumption that
> validation (and decoration of defaulted values) has already taken place.  As
> all syntactic constraints have already been applied, all I need to do is to
> build a something generally mapping from an infoset for a schema to the
> internal model (components), which I can then use to validate an instance
> conforming to that schema.  For example, at that level, I don't care whether
> attributes are allowed before or after the content model (or mixed in, or
> whatever), all I want is the set of nodes corresponding to the definitions.

Sounds sensible, indeed for full compliance you _must_ be able to
start from an infoset.

> This leads to the notion of bootstrapping.  Given the above (something which
> walks the infoset for a parsed schema and builds a validator for that schema
> from it), and given an s-f-s whose well-formed infoset is exactly the same
> as its validated infoset (a fixed point)

I agree bootstrapping is a sensible route.  Why is the fixed-point a
requirement? [see below for my guess]

> [then] I can generate a validator for schema documents by parsing
> that s-f-s and then applying that to any future schemas that come my
> way (such as back onto the s-f-s).

If by parsing you mean building components per the mapping rules, I
agree.  If you implement the mapping rules in full, you have no need
for the fixed-point requirement.  If you wish to avoid implementing
the defaulting clauses among the mapping rules, you could do so if the
fixed-point requirement was satisfied.

> That way I have a
> conformant - yes, conformant - XSDL processor for which _none_ of the
> properties/constraints specifiable in schema itself need be implemented in
> code.  I've now significantly reduced my development time to having a
> working validator, especially as I can depend on my more robust general
> validator-building code rather than its more error-prone, hand-coded s-f-s
> special case equivalent.

Again, we're in violent agreement.  They _only_ difference between a
fully-specified s-for-s and the current one as far as development time 
is concerned is the defaults.  I just don't see this as a big deal.
Or is there some other aspect I'm missing?

<snip/>  [reasserts your point two more times, making 3, by my count :-]

> Now that we are in CR, there is always the possibility that implementation
> feedback will lead us to change aspects of language syntax.  For any such
> decision, there must result a change in:
> 1) the language of the spec
> 2) the DTD
> 3) the s-f-s
> 4) validation code for anyone trying to build a validator that doesn't use
> the DTD, for the reasons specified above

> At this point in the game, the major value of a s-f-s should be to eliminate
> 2 and minimize, to the degree possible, 1 and 4.  That would be good
> software engineering practice.

The only change eliminating the DTD would have on reducing would the
allegedly negative characteristics for case 4 are attribute value
defaults.  The only attribute value defaults in the DTD are as
follows:

<!ATTLIST %schema;
   finalDefault         %complexDerivationSet; ''
   blockDefault         %blockSet;             ''
   elementFormDefault   %formValues;           'unqualified'
   attributeFormDefault %formValues;           'unqualified'>
<!ATTLIST %complexType;
          abstract  %boolean;                       'false'
          block     %complexDerivationSet;          ''
          mixed (true|false)                        'false'>
<!ATTLIST %element;
            abstract           %boolean;              'false'>
<!ATTLIST %choice;
          minOccurs   %nonNegativeInteger;   '1'
          maxOccurs   CDATA                  '1'>
<!ATTLIST %sequence;
          minOccurs   %nonNegativeInteger;   '1'
          maxOccurs   CDATA                  '1'>
<!ATTLIST %any;
            namespace       CDATA                  '##any'
            processContents (skip|lax|strict)      'strict'
            minOccurs       %nonNegativeInteger;   '1'
            maxOccurs       CDATA                  '1'>
<!ATTLIST %anyAttribute;
            namespace       CDATA              '##any'
            processContents (skip|lax|strict)  'strict'>

As I said before, I'm happy to discuss the pros and cons of
implementing these by hand in the published s-for-s, but I personally
think the significant loss in readibility which this would bring would
not be worth the small gain in implementation simplicity.  The obvious 
compromise would be for there to be two versions of the actual s-for-s 
out there at URIs, one expanded and one unexpanded.

ht
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2001, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
Received on Thursday, 9 November 2000 03:51:34 UTC