RE: Schema Design: Composition vs Subclassing from John Utz on 2002-04-17 (xmlschema-dev@w3.org from April 2002)

From: John Utz <utz@singingfish.com>
Date: Wed, 17 Apr 2002 08:52:48 -0700 (PDT)
To: Mark Feblowitz <mfeblowitz@frictionless.com>
cc: "'Jeni Tennison'" <jeni@jenitennison.com>, Paul Kiel <paul@hr-xml.org>, xmlschema-dev@w3.org, "Alexander Falk (E-mail)" <al@altova.com>
Message-ID: <Pine.LNX.4.21.0204170843490.27115-100000@mail.singingfish.com>
On Wed, 17 Apr 2002, Mark Feblowitz wrote:

> Yup - you got the gist.
<snip>
> Schematron is indeed well suited for this task, since the rules can be
> maintained separately, and there can be multiple sets, one for each
<snip>
> The part that most troubles many end users about using Schematron (aside
> from its sci-fi name) is that the constraints are applied by a separate
> utility - not the parser. Even if there were no significant performance
> penalty to running Schematron (i.e., even if it were run in the same JVM),
> there is still a perception problem on the part of many users, who worry
> about the extra logistics of running a separate pass and about the perceived
> added complexity.
<snip>
> co-occurrence constraints. In this way, the evaluation of the constraint
> expressions could be done "by the parser", at least as perceived by most
> Schema users.
> 

there is a design discussion going on in xerces-dev about some new
interfaces and implementations for GrammarCaching that i had the poor
taste to attempt to hijack into a discussion about grammar parsing :-)

one of the good things that came out of my thwarted attempt was a
discussion about SchemaParsing and the interfaces thereof.

i made the argument that the interface should be public so that
developers could implement their own parsers.

managing embedded schematron in an end user transparent way was *exactly*
the rationale i had in mind for said interface.

obviously, you cant avoid a performance hit, but you can probably minimize
it alot by making sure that it happens in the same process so that you
dont have to reload/reread the grammar 2x just to do the validation.

<snip>
 
> Thanks,
> 
> Mark
> 
> ----------------------------------------------------------------------------
> ----
>  
> Mark Feblowitz                                   [t] 617.715.7231
> Frictionless Commerce Incorporated     [f] 617.495.0188 
> XML Architect                                     [e]
> mfeblowitz@frictionless.com
> 400 Technology Square, 9th Floor 
> Cambridge, MA 02139 
> www.frictionless.com  
>  
> 
>  -----Original Message-----
> From: 	Jeni Tennison [mailto:jeni@jenitennison.com] 
> Sent:	Wednesday, April 17, 2002 7:11 AM
> To:	Mark Feblowitz
> Cc:	Paul Kiel; xmlschema-dev@w3.org
> Subject:	Re: Schema Design: Composition vs Subclassing
> 
> Hi Mark,
> 
> > What we recognized is that each use of a particular Noun shares a
> > common structure (same child elements, same order), and that they
> > only differ in the cardinalities of their child elements. That's why
> > we factored out the cardinalities from the definition of structure:
> > we define the structure of a thing once, and we define the possible
> > combinations of the cardinalities of its parts separately.
> 
> This is extremely interesting. You're making a distinction between:
> 
>   - the order in which elements appear
>   - the number of times those elements appear
> 
> Traditionally, content models have combined these two factors. When a
> validator goes through an instance it checks the number of occurrences
> of an element at the same time as checking that they're appearing in
> the correct order.
> 
> But you could imagine a content model that expressed the two factors
> independently. For example, given a DTD content model of:
> 
>   (E1, E2?, E3+, E4)
> 
> you could have one part that said that E1, E2, E3 and E4 must appear
> in order, followed by a second part that said that E2 was optional and
> E3 had to occur one or more times. Making up a syntax:
> 
>   (E1, E2, E3, E4) & E2? & E3+
> 
> A validator could feasibly go through an element's content twice, once
> checking order and once checking cardinality, or it could combine the
> two parts to create a traditional content model.
> 
> Separating out these two parts would enable you to vary one while
> keeping the other the same. So for example you could say that all
> Nouns shared the same order of elements and swap in different
> cardinality constraints as and when required.
> 
> As far as I know, the only schema technology that enables you to make
> that kind of division explicitly is Schematron -- you can have one set
> of rules that test ordering, and another set of rules that test
> cardinality. When RELAX NG had 'concur' you could have used that
> (essentially imposing two overlapping sets of constraints on the
> content model); you could still use it with TREX I suppose.
> 
> But this is essentially what you're doing -- using XML Schema to
> provide the ordering constraints (which means that you have to be very
> flexible about the cardinality, essentially not testing it or testing
> it within known limits) and another technology to provide concurrent
> validation of the cardinality constraints.
> 
> This is interesting in terms of schema language development because it
> implies that something like 'concur' would be a useful addition to the
> language. You could imagine it solving some of the problems that
> people have with "these elements, in any order, with these
> cardinalities" as well.
> 
> It's also interesting in terms of the relationship with OO
> technologies. In OO technologies, ordering isn't an issue, only
> cardinality, so the normal "best practice" for OO probably isn't going
> to help here. Design patterns for OO technologies simply don't have to
> deal with this kind of issue.
> 
> Anyway, after all that, to answer the non-rhetorical question in your
> post:
> 
> > (which reminds me, are groups extensible? How does one do so?)
> 
> They're only extensible (while retaining the same name) when you
> redefine schema. When you use xs:redefine, you can change model groups
> as long as either:
> 
>   - you reference the model group within the new model group
>     (essentially allowing you to extend it) or
> 
>   - the redefined model group is a valid restriction of the original
>     model group (essentially allowing you to restrict it)
> 
> Of course you can reference model groups with different names wherever
> you like within a model group -- that's equivalent to extending
> complex types within a single schema.
>     
> Cheers,
> 
> Jeni
> 
> ---
> Jeni Tennison
> http://www.jenitennison.com/
> 
>
Received on Wednesday, 17 April 2002 11:53:25 UTC