Resend: Re: Schema Design: Composition vs Subclassing from Jeni Tennison on 2002-04-03 (xmlschema-dev@w3.org from April 2002)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Wed, 3 Apr 2002 14:50:37 +0100
To: xmlschema-dev@w3.org
Message-ID: <29109034813.20020403145037@jenitennison.com>
Hi Roger,

> It dawns on me that this is the old Object-Oriented issue of
> design-by-subclassing versus design-by-composition, now rearing its
> head in the design of XML Schemas. Let's consider these two design
> approaches as they apply to XML Schemas.
>
> Let's compare these two design approaches:
>  . design-by-subclassing (i.e., type hierarchies) 
>       versus 
>  . design-by-composition (i.e., bundling together element groups).

Just to add a few fairly random thoughts...

Design-by-composition is, of course, the approach that RELAX NG takes,
but does in a much more flexible way. I think that the most important
difference with RELAX NG, and something that really limits
design-by-composition in XML Schema, is that in XML Schema each group
can either contain a content model or attributes, but not both.
Imagine that you had a type:

<xs:complexType name="C1">
  <xs:sequence>
    <xs:element name="E1" .../>
    <xs:element name="E2" .../>
  </xs:sequence>
  <xs:attribute name="A1" .../>
</xs:complexType>

Using composition, you'd have to use two groups to replace the one
complex type:

<xs:group name="G1">
  <xs:sequence>
    <xs:element name="E1" .../>
    <xs:element name="E2" .../>
  </xs:sequence>
</xs:group>

<xs:attributeGroup name="AG1">
  <xs:attribute name="A1" .../>
</xs:attributeGroup>

If the content model and the attributes are conceptually linked,
splitting them up doesn't seem wise. It makes it easy for someone to
accidentally omit one or the other when both should always be present.
You don't have that problem with complex types.

The second (related) issue that came to mind was that you're quite
able to use groups in a highly coupled manner. A fairer approximation
of the complex types would be:

<xs:group name="G1">
  <xs:sequence>
    <xs:element name="E1" type="..."/>
    <xs:element name="E2" type="..."/>
    <xs:element name="E3" type="..."/>
  </xs:sequence>
</xs:complexType>

<xs:group name="G2">
  <xs:sequence>
    <xs:group ref="G1"/>
    <xs:element name="E4" type="..."/>
  </xs:sequence>
</xs:complexType>

<xs:group name="G3">
  <xs:sequence>
    <xs:group ref="G2"/>
    <xs:element name="E5" type="..."/>
    <xs:element name="E6" type="..."/>
  </xs:sequence>
</xs:complexType>

<xs:element name="root">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="G3"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

I think it's important to show that it's not the fact that you're
using groups that gives you the advantage, it's the *way* that you use
the groups. And it means that you have to consider at what level you
cluster elements together. Is the rule that you only create groups
with element particles?

I'll also note that whichever way you do it, you're going to end up
with having to understand three or four separate components -- the
only difference between composition and subclassing at this level is
whether they're arrange horizontally (composition) or vertically
(subclassing). And which of those you find easiest to understand and
work with probably comes down to personal taste.

Another thing is to consider the work of the schema designer in
creating the content models for the elements. I'd assume that if the
order of the elements should be E1, E2, E3, E4, ... for one element then it
should be E1, E2, E3, E4, ... for another element. With composition,
though, it's easy to accidentally change the ordering of the groups
that you use:

<xs:element name="root">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="G2"/>
      <xs:group ref="G3"/>
      <xs:group ref="G1"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

(Of course it's obvious here, but in real life you'd use real names
for the groups and there'd probably be more of them, so it would be
far easier to make this mistake.) So I think that using composition,
it's easier to create inconsistent content models, and inconsistency
is a headache for authors/developers who have to use the markup
language. On the other hand, some might view this as providing
flexibility compared to the rules about derivation by extension.

Finally, I think that design-by-subclassing has one killer advantage,
namely that applications can use information about the type hierarchy
in order to provide common processing for all elements of a particular
(high level) type. In your example, the root element has the types C3,
C2 and C1, and an application could use the fact that it knows that
the root element is of type C2 to know that it can process elements
E1, E2, E3 and E4 without having to know that it's also contains
elements E5 and E6. The application can also know that other elements
of type C2, whether they are of type C3 or some other type, can be
processed in the same way. On the other hand, information about the
model groups that have been composed to create a content model is lost
(I believe).

I think it's fair to say that we haven't seen this advantage in the
real world yet. There simply aren't parsers that make typing
information available. Possibly XPath/XSLT 2.0 and DOM 3.0 AS will
help here...

Hmm... those turned out to be generally pro design-by-subclassing --
perhaps that's just because I'm a natural devil's advocate ;) I
suppose if I'm thinking in object-oriented terms, I think of the type
hierarchy as being like the class hierarchy and groups as being like
interfaces.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/
Received on Wednesday, 3 April 2002 08:50:44 UTC