RE: Schema Design: Composition vs Subclassing from Mark Feblowitz on 2002-04-17 (xmlschema-dev@w3.org from April 2002)

From: Mark Feblowitz <mfeblowitz@frictionless.com>
Date: Wed, 17 Apr 2002 10:53:19 -0400
To: "'Jeni Tennison'" <jeni@jenitennison.com>
Cc: Paul Kiel <paul@hr-xml.org>, xmlschema-dev@w3.org, "Alexander Falk (E-mail)" <al@altova.com>
Message-ID: <4DBDB4044ABED31183C000508BA0E97F040ABA2A@fcpostal.frictionless.com>
Yup - you got the gist.

I some regards, the order of the content isn't even a firm requirement -
just the existence of the content. The fact that order comes into play could
either have been based on a hard ordering requirement, but in most cases
it's a side effect of the limitations on the use of "all".

On the subject of separating the concerns (cardinality from
structure/content), there's another aspect: whether the cardinality
constraints are kept with the "structural" schema or whether they are
separate (in separate files). The example,  (E1, E2, E3, E4) & E2? & E3+,
implies that they are kept together. What would be better would be that the
base structural schema, represented by   (E1, E2, E3, E4), be kept separate
from the cardinality-constraining schema,  E2? & E3+, e.g., in a separate
xsd file. In this way, one could have a single structural base and then
constrain it in many different ways (one way per differing context, e.g.,
one way per differing transaction). Another benefit of separating out the
cardinality constraints is that they can be modified or new ones added
without having to rev the structural base - an important factor in
maintaining a schema over time. 

Schematron is indeed well suited for this task, since the rules can be
maintained separately, and there can be multiple sets, one for each
different blend of cardinality constraints. The fact that Schematron uses
XPath expressions is the real strength here; that's what enables the
constraint to be overlaid on the correct part of the structural schema.

The part that most troubles many end users about using Schematron (aside
from its sci-fi name) is that the constraints are applied by a separate
utility - not the parser. Even if there were no significant performance
penalty to running Schematron (i.e., even if it were run in the same JVM),
there is still a perception problem on the part of many users, who worry
about the extra logistics of running a separate pass and about the perceived
added complexity.

Since Schema validators must have XPath expression evaluation capabilities
in order to process key and unique expressions, it would seem like a small
step to also support the evaluation of other XPath expressions, e.g., those
used to express overlaid cardinality constraints or those used to apply
co-occurrence constraints. In this way, the evaluation of the constraint
expressions could be done "by the parser", at least as perceived by most
Schema users.

In one possible scenario the base structural schema (Person) plus the
cardinality constraints (personTransaction1Constaints) could be brought in
via includes, and the parser could check both "at the same time" (likely in
separate passes). All the user would need to know is how to express the
constraints; they wouldn't need to develop an architecture for applying a
separate checking step.

BTW, I know that the folks on the Schema Working Group can't comment on what
they're considering for 1.1. I just want to make sure that these discussions
are being read by and considered by at least some of the members. Is this
list the correct forum, or is that another one that I should also be CCing?


On the non-rhetorical part, I'm still unclear on how a group G in namespace
ns1 could be referenced in a type T in ns1, for T to be extended in
namespace ns2, e.g., by extending ns1:G. A redefine of ns1:G to add new
content (e.g., elements defined in ns2) could be the way to go, but seems
tricky: in order to extend ns1:G with content from ns2 and then use it in
ns2, I redefined it in ns1 (replicating the original content?), import the
additional content from ns2, and then make sure that ns2 imports and that
ns2:T uses the redefinition. Hmmm.


Thanks,

Mark

----------------------------------------------------------------------------
----
 
Mark Feblowitz                                   [t] 617.715.7231
Frictionless Commerce Incorporated     [f] 617.495.0188 
XML Architect                                     [e]
mfeblowitz@frictionless.com
400 Technology Square, 9th Floor 
Cambridge, MA 02139 
www.frictionless.com  
 

 -----Original Message-----
From: 	Jeni Tennison [mailto:jeni@jenitennison.com] 
Sent:	Wednesday, April 17, 2002 7:11 AM
To:	Mark Feblowitz
Cc:	Paul Kiel; xmlschema-dev@w3.org
Subject:	Re: Schema Design: Composition vs Subclassing

Hi Mark,

> What we recognized is that each use of a particular Noun shares a
> common structure (same child elements, same order), and that they
> only differ in the cardinalities of their child elements. That's why
> we factored out the cardinalities from the definition of structure:
> we define the structure of a thing once, and we define the possible
> combinations of the cardinalities of its parts separately.

This is extremely interesting. You're making a distinction between:

  - the order in which elements appear
  - the number of times those elements appear

Traditionally, content models have combined these two factors. When a
validator goes through an instance it checks the number of occurrences
of an element at the same time as checking that they're appearing in
the correct order.

But you could imagine a content model that expressed the two factors
independently. For example, given a DTD content model of:

  (E1, E2?, E3+, E4)

you could have one part that said that E1, E2, E3 and E4 must appear
in order, followed by a second part that said that E2 was optional and
E3 had to occur one or more times. Making up a syntax:

  (E1, E2, E3, E4) & E2? & E3+

A validator could feasibly go through an element's content twice, once
checking order and once checking cardinality, or it could combine the
two parts to create a traditional content model.

Separating out these two parts would enable you to vary one while
keeping the other the same. So for example you could say that all
Nouns shared the same order of elements and swap in different
cardinality constraints as and when required.

As far as I know, the only schema technology that enables you to make
that kind of division explicitly is Schematron -- you can have one set
of rules that test ordering, and another set of rules that test
cardinality. When RELAX NG had 'concur' you could have used that
(essentially imposing two overlapping sets of constraints on the
content model); you could still use it with TREX I suppose.

But this is essentially what you're doing -- using XML Schema to
provide the ordering constraints (which means that you have to be very
flexible about the cardinality, essentially not testing it or testing
it within known limits) and another technology to provide concurrent
validation of the cardinality constraints.

This is interesting in terms of schema language development because it
implies that something like 'concur' would be a useful addition to the
language. You could imagine it solving some of the problems that
people have with "these elements, in any order, with these
cardinalities" as well.

It's also interesting in terms of the relationship with OO
technologies. In OO technologies, ordering isn't an issue, only
cardinality, so the normal "best practice" for OO probably isn't going
to help here. Design patterns for OO technologies simply don't have to
deal with this kind of issue.

Anyway, after all that, to answer the non-rhetorical question in your
post:

> (which reminds me, are groups extensible? How does one do so?)

They're only extensible (while retaining the same name) when you
redefine schema. When you use xs:redefine, you can change model groups
as long as either:

  - you reference the model group within the new model group
    (essentially allowing you to extend it) or

  - the redefined model group is a valid restriction of the original
    model group (essentially allowing you to restrict it)

Of course you can reference model groups with different names wherever
you like within a model group -- that's equivalent to extending
complex types within a single schema.
    
Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/
Received on Wednesday, 17 April 2002 10:54:36 UTC