RE: Schema Design: Composition vs Subclassing from Mark Feblowitz on 2002-04-17 (xmlschema-dev@w3.org from April 2002)

From: Mark Feblowitz <mfeblowitz@frictionless.com>
Date: Wed, 17 Apr 2002 13:15:55 -0400
To: "'John Utz'" <utz@singingfish.com>
Cc: "'Jeni Tennison'" <jeni@jenitennison.com>, Paul Kiel <paul@hr-xml.org>, xmlschema-dev@w3.org, "Alexander Falk (E-mail)" <al@altova.com>
Message-ID: <4DBDB4044ABED31183C000508BA0E97F040ABA34@fcpostal.frictionless.com>
You can also avoid the double-parsing. If your xsl processor uses a
schema-validating parser, it can both validate and apply the constraints
without re-parsing. If not, in many cases your parser can hand off the parse
result (a DOM or the SAX parse events) to the XSL processor for subsequent
processing.

It's important to note, though, that an XSL processor isn't necessary. The
use of XSL processors is a convenience, in that they support both XPath
expression interpretation and support for formatting constraint violation
messages. 

The constraints themselves are XPath expressions, so your parser need only
to be able to read and evaluate XPath and report on constraint violations.
This can be supported by incorporating one of the emergent XPath processing
APIs, e.g., SAXPath. 


Mark

----------------------------------------------------------------------------
----
 
Mark Feblowitz                                   [t] 617.715.7231
Frictionless Commerce Incorporated     [f] 617.495.0188 
XML Architect                                     [e]
mfeblowitz@frictionless.com
400 Technology Square, 9th Floor 
Cambridge, MA 02139 
www.frictionless.com  
 

 -----Original Message-----
From: 	John Utz [mailto:utz@singingfish.com] 
Sent:	Wednesday, April 17, 2002 11:53 AM
To:	Mark Feblowitz
Cc:	'Jeni Tennison'; Paul Kiel; xmlschema-dev@w3.org; Alexander Falk
(E-mail)
Subject:	RE: Schema Design: Composition vs Subclassing



On Wed, 17 Apr 2002, Mark Feblowitz wrote:

> Yup - you got the gist.
<snip>
> Schematron is indeed well suited for this task, since the rules can be
> maintained separately, and there can be multiple sets, one for each
<snip>
> The part that most troubles many end users about using Schematron (aside
> from its sci-fi name) is that the constraints are applied by a separate
> utility - not the parser. Even if there were no significant performance
> penalty to running Schematron (i.e., even if it were run in the same JVM),
> there is still a perception problem on the part of many users, who worry
> about the extra logistics of running a separate pass and about the
perceived
> added complexity.
<snip>
> co-occurrence constraints. In this way, the evaluation of the constraint
> expressions could be done "by the parser", at least as perceived by most
> Schema users.
> 

there is a design discussion going on in xerces-dev about some new
interfaces and implementations for GrammarCaching that i had the poor
taste to attempt to hijack into a discussion about grammar parsing :-)

one of the good things that came out of my thwarted attempt was a
discussion about SchemaParsing and the interfaces thereof.

i made the argument that the interface should be public so that
developers could implement their own parsers.

managing embedded schematron in an end user transparent way was *exactly*
the rationale i had in mind for said interface.

obviously, you cant avoid a performance hit, but you can probably minimize
it alot by making sure that it happens in the same process so that you
dont have to reload/reread the grammar 2x just to do the validation.

<snip>
 
> Thanks,
> 
> Mark
> 
>
----------------------------------------------------------------------------
> ----
>  
> Mark Feblowitz                                   [t] 617.715.7231
> Frictionless Commerce Incorporated     [f] 617.495.0188 
> XML Architect                                     [e]
> mfeblowitz@frictionless.com
> 400 Technology Square, 9th Floor 
> Cambridge, MA 02139 
> www.frictionless.com  
>  
> 
>  -----Original Message-----
> From: 	Jeni Tennison [mailto:jeni@jenitennison.com] 
> Sent:	Wednesday, April 17, 2002 7:11 AM
> To:	Mark Feblowitz
> Cc:	Paul Kiel; xmlschema-dev@w3.org
> Subject:	Re: Schema Design: Composition vs Subclassing
> 
> Hi Mark,
> 
> > What we recognized is that each use of a particular Noun shares a
> > common structure (same child elements, same order), and that they
> > only differ in the cardinalities of their child elements. That's why
> > we factored out the cardinalities from the definition of structure:
> > we define the structure of a thing once, and we define the possible
> > combinations of the cardinalities of its parts separately.
> 
> This is extremely interesting. You're making a distinction between:
> 
>   - the order in which elements appear
>   - the number of times those elements appear
> 
> Traditionally, content models have combined these two factors. When a
> validator goes through an instance it checks the number of occurrences
> of an element at the same time as checking that they're appearing in
> the correct order.
> 
> But you could imagine a content model that expressed the two factors
> independently. For example, given a DTD content model of:
> 
>   (E1, E2?, E3+, E4)
> 
> you could have one part that said that E1, E2, E3 and E4 must appear
> in order, followed by a second part that said that E2 was optional and
> E3 had to occur one or more times. Making up a syntax:
> 
>   (E1, E2, E3, E4) & E2? & E3+
> 
> A validator could feasibly go through an element's content twice, once
> checking order and once checking cardinality, or it could combine the
> two parts to create a traditional content model.
> 
> Separating out these two parts would enable you to vary one while
> keeping the other the same. So for example you could say that all
> Nouns shared the same order of elements and swap in different
> cardinality constraints as and when required.
> 
> As far as I know, the only schema technology that enables you to make
> that kind of division explicitly is Schematron -- you can have one set
> of rules that test ordering, and another set of rules that test
> cardinality. When RELAX NG had 'concur' you could have used that
> (essentially imposing two overlapping sets of constraints on the
> content model); you could still use it with TREX I suppose.
> 
> But this is essentially what you're doing -- using XML Schema to
> provide the ordering constraints (which means that you have to be very
> flexible about the cardinality, essentially not testing it or testing
> it within known limits) and another technology to provide concurrent
> validation of the cardinality constraints.
> 
> This is interesting in terms of schema language development because it
> implies that something like 'concur' would be a useful addition to the
> language. You could imagine it solving some of the problems that
> people have with "these elements, in any order, with these
> cardinalities" as well.
> 
> It's also interesting in terms of the relationship with OO
> technologies. In OO technologies, ordering isn't an issue, only
> cardinality, so the normal "best practice" for OO probably isn't going
> to help here. Design patterns for OO technologies simply don't have to
> deal with this kind of issue.
> 
> Anyway, after all that, to answer the non-rhetorical question in your
> post:
> 
> > (which reminds me, are groups extensible? How does one do so?)
> 
> They're only extensible (while retaining the same name) when you
> redefine schema. When you use xs:redefine, you can change model groups
> as long as either:
> 
>   - you reference the model group within the new model group
>     (essentially allowing you to extend it) or
> 
>   - the redefined model group is a valid restriction of the original
>     model group (essentially allowing you to restrict it)
> 
> Of course you can reference model groups with different names wherever
> you like within a model group -- that's equivalent to extending
> complex types within a single schema.
>     
> Cheers,
> 
> Jeni
> 
> ---
> Jeni Tennison
> http://www.jenitennison.com/
> 
>
Received on Wednesday, 17 April 2002 13:17:04 UTC