RE: Schema Design: Composition vs Subclassing

In the construction of OAGIS 8, we went back and forth of this. Ultimately,
we settled on a hybrid of component composition plus use of type
hierarchies. 

As I see it, you trade off one set of confusions for another - either the
schema is hierarchical and constraining, or it's componentized and simple.
When it's hierarchical, there is a guaranteed uniformity, which is important
is cases - such as standards - where such uniformity and predictability are
critical. It gets challenging when you start mixing extensions and
restrictions and trying to tailor it to your specific needs down the
derivation chain.

The component-oriented approach is simpler to grasp and use: you construct
what you need out of discrete components, which simplifies tailoring. On the
other hand, if there is a family of (non-defined) types that share common
structure, and that common structure is not captured explicitly in a type
definition, it's up to every user of that phantom type to faithfully
reproduce that type (those types) in their specialized case. (I.e., if every
C has components X, Y, and Z, then everybody who constructs a thing similar
to C will have to remember to include an X, Y, and Z - in that order). Thus,
assembly from parts requires a full understanding of the unexpressed type
constraints that would otherwise be declared in a type, yet are still
important to each thing "of the same kind."

The design we settled on acknowledges components and component types,
supports the construction of types from these components, supports the
extension of these components, and supports a set of predefined, shared
hierarchical (extensible) types. Users that want can use the predefined
types, when the type fit the requirements without too much grief. The
components that were used to construct the types  can also be composed into
new (possibly similar) types, when the munging of the original types gets
too complex (typically, in most places where restriction or cancelation
arise). Sounds pretty good, but it still carries the limitations that Schema
imposes, and will likely end up requiring some pretty deep understanding of
Schema to use it effectively.

One thing to note is that the single inheritance model underlying Schema
changes this discussion significantly, skewing toward component assembly.
It's pretty confounding trying to have the "right" hierarchy when the type
you're defining is a natural descendent of two disparate base types. You end
up defining the new base type as either the assembly of components from each
base, or you end up selecting a "dominant" parent type and then mixing in
the content from the other type(s). That requires some deft modeling at the
point where the two types come together. 

I suspect that you'll get a lot of answers claiming one approach is
significantly better than the other. From that you should be able to glean a
pretty good set of examples where each approach is superior. That's the best
we can hope for, since neither is ideal for all cases.

Hope that's useful,

Mark

----------------------------------------------------------------------------
----
 
Mark Feblowitz                                   [t] 617.715.7231
Frictionless Commerce Incorporated     [f] 617.495.0188 
XML Architect                                     [e]
mfeblowitz@frictionless.com
400 Technology Square, 9th Floor 
Cambridge, MA 02139 
www.frictionless.com  
 

 -----Original Message-----
From: 	Roger L. Costello [mailto:costello@mitre.org] 
Sent:	Tuesday, April 02, 2002 5:24 PM
To:	Curt.Arnold@hyprotech.com; xmlschema-dev@w3.org;
jeni@jenitennison.com; Simon.Cox@csiro.au; costello@mitre.org
Subject:	Schema Design: Composition vs Subclassing

[Curt, I vaguely recall you having some thoughts on this topic a long
time ago.  Please chime in.]

As I sit here at my desk analyzing a schema with a huge hierarchy chain,
I seriously begin to question the value of schema type hierarchies,
especially schemas containing hierarchies with many levels. I ponder
ways to break the chain and simplify the schema.  I envision a schema
design whereby independent, decoupled components are simply assembled
together.  

It dawns on me that this is the old Object-Oriented issue of
design-by-subclassing versus design-by-composition, now rearing its head
in the design of XML Schemas.  Let's consider these two design
approaches as they apply to XML Schemas.

Let's compare these two design approaches:
 . design-by-subclassing (i.e., type hierarchies) 
      versus 
 . design-by-composition (i.e., bundling together element groups).

---------------------------------------------------------------
** Design-by-subclassing **

To compare design approaches consider this type hierarchy:

<xsd:complexType name="C1">
    <xsd:sequence>
        <xsd:element name="E1" type="..."/>
        <xsd:element name="E2" type="..."/>
        <xsd:element name="E3" type="..."/>
    </xsd:sequence>
</xsd:complexType>

<xsd:complexType name="C2">
    <xsd:complexContent>
        <xsd:extension base="C1">
            <xsd:sequence>
                <xsd:element name="E4" type="..."/>
            </xsd:sequence>
        </xsd:extension>
    </xsd:complexContent>
</xsd:complexType>

<xsd:complexType name="C3">
    <xsd:complexContent>
        <xsd:extension base="C2">
            <xsd:sequence>
                <xsd:element name="E5" type="..."/>
                <xsd:element name="E6" type="..."/>
            </xsd:sequence>
        </xsd:extension>
    </xsd:complexContent>
</xsd:complexType>

<xsd:element name="root" type="C3"/>

Here we see that that the <root> element is of type C3.  C3 extends type
C2, so to understand type C3 you must understand C2.  But to understand
C2 you must understand type C1.  Already it is getting very difficult to
understand the <root> element (and this is a short hierarchy).  Further,
if any type along the hierarchy changes (i.e., we add a new element
and/or delete an element) then everything under it breaks.  

CONCLUSION

Design-by-subclassing yields highly coupled, brittle schemas.

---------------------------------------------------------------
** Design-by-composition **

Let's contrast the above design approach with a composition design.  In
this approach we create independent (off-the-shelf) group components.
The <root> element is declared by simply assembling together the desired
components:

<xsd:group name="G1">
    <xsd:sequence>
        <xsd:element name="E1" type="..."/>
        <xsd:element name="E2" type="..."/>
        <xsd:element name="E3" type="..."/>
    </xsd:sequence>
</xsd:complexType>

<xsd:group name="G2">
    <xsd:sequence>
        <xsd:element name="E4" type="..."/>
    </xsd:sequence>
</xsd:complexType>

<xsd:group name="G3">
    <xsd:sequence>
        <xsd:element name="E5" type="..."/>
        <xsd:element name="E62" type="..."/>
    </xsd:sequence>
</xsd:complexType>

<xsd:element name="root">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:group ref="G1"/>
            <xsd:group ref="G2"/>
            <xsd:group ref="G3"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

Again, as we see, the creation of the <root> element is simply a matter
of assembling together the desired pieces.

With this approach: 
 . it is much easier to understand the the schema since you can 
   focus on each component one at a time,
 . each component is independent amd decoupled.  Any changes to 
   one component will not impact the other components.  

CONCLUSION

Design-by-composition yields scalable, robust schemas.

---------------------------------------------------------------
What are your thoughts on this?  /Roger

Received on Wednesday, 3 April 2002 10:47:01 UTC