[Fwd: [Fwd: Schema Design: Composition vs Subclassing]] from paul.w.daisey@census.gov on 2002-04-04 (xmlschema-dev@w3.org from April 2002)

From: <paul.w.daisey@census.gov>
Date: Thu, 4 Apr 2002 14:46:51 -0500 (EST)
To: costello@mitre.org
Cc: Simon.Cox@csiro.au, jeni@jenitennison.com, dwc@mitre.org, plj@mitre.org, dhoward@mitre.org, xmlschema-dev@w3.org, gml30.rwg@opengis.org
Message-ID: <OF2CDCD8A9.CB8826D2-ON85256B91.0067ECAB@tco.census.gov>
All:

     Simon and Dave have invited me to join your discussion.

Roger:

     In general terms, I think the nature of one's problem space should be
a major consideration in deciding whether to use deep, rigid type
inheritance hierarchies, or shallow, flexible type hierarchies defined
mostly by composition.   Some problem spaces, like both ISO 19107 Geometry
/ Topology definitions, and Census geography, have traditionally had both
very rigid and inflexible definitions, and types that lend themselves to
hierarchical definition.

     I thought Jeni Tennison's comments on potential disadvanges of
definition by composition were quite thorough.  The "killer" advantage of
design by subclassing was the one I was going to raise, that it explicitly
retains type inheritance identification; E isa D isa C isa B isa A, in the
extension / restriction base="" attribute.   This is an advantage not just
in XML, but also in using XML/Schemas as the quasi-MDA base for object
definitions in languages like Java, for example with a product like Castor,
whereby the type inheritance identification is passed to the generated
language classes.  I'm not sure I entirely agree with the conclusion: "
information about the model groups that have been composed to create a
content model is lost(I believe)".   I think that it is possible to write
xslt to determine the implicit type inheritance hierarchies in XML/Schema
types defined by composition by comparative inventory of their contents.
But in any case the type inheritance information is certainly less
accessible than in the design by subclassing case.

     In designing GML, at several points the analogy to object type
libraries was raised as a design and usage model, although I don't remember
if such a reference made it into the GML v2 document.   Your points about
the disadvantages of incompatabilies in derived types caused by changes in
base types is well taken by those of us who have been trying to retain
backward compatability as we move to GML v3.   I guess I'd argue that those
difficulties are worth wrestling with so that the task of developing
general tools for processing and transforming GML is facilitated by type
inheritance information in the <Class> and <property> hierarchies in GML.
Such tools can then, for instance, deal with a tgr:TigerLineString as a
gml:LineString and ignore its added elements, making them useful for
documents conforming to a variety of application-specific schemas that
extend and restrict GML types.    Or as Jon Udell said in "Java, XML, and
Web Services" in the March 25 InfoWorld, "The more XML messages say about
themselves, the less their senders and receivers need to know about one
another's infrastructures."

     Although I frequently disagree with your conclusions as to what
constitutes "best practice", I've learned a lot from what you've written,
think you are performing a valuable service by examining these issues, and
encourage you to keep it up.

     Best regards,

Paul

                                                  
 Paul.W.Daisey@census.gov     U.S. Census Bureau  
                                                  
     phone: (301)             Geography Division  
 457-4308                                         
                                                  
     fax:     (301)       www.census.gov/geo/www/ 
 457-4710                                         
                                                  


----- Forwarded by Paul W Daisey/GEO/HQ/BOC on 04/04/2002 01:55 PM -----

Simon.Cox@csiro.au wrote:
>
> Roger - I think I agree with where you are going here.
> An additional advantage is that design-by-composition gives a kind of
weak
> multiple-inheritance method, which can be very useful.
>
> From time to time I have tried to gently push the GML crew in this
> direction.
> For example, I was responsible for all the <group> and <attributeGroup>s
in
> GML - not many but they do lurk in 3 (?) places
(associationAttributeGroup,
> locator, dynamicProperties).
> And if you can be bothered to look into them, the modified schema docs I
> sent you yesterday did, in fact, replace a bunch of types defined through
> derivation by restriction, with types defined "fresh", by composition,
but
> following a "pattern".
> This was mainly in the "property" part of the dual hierarchy.
> The type derivation in the "Object" part is much less troublesome and
> contentious.
>
> One of the constraints in GML is a requirement - strong from some
> stakeholders, but not all - to follow UML models from the ISO 191XX
series -
> in particular ISO 19107 which is a complex geometry model.
> But the market wants XML solutions.
> So we have tried to reproduce the complete object model in XML, with all
the
> inheritance hierarchies realised as XML type-derivation hierarchies.
> This MDA tendency is seductive to people who are geographers first and
> analysts second, but with a attraction to analysis.
>
> There is some resistance to "flattening" the schemas.
> Paul Daisey is probably the person with the most coherent use-case for
> retaining them, though I'm not sure I fully understand his argument.
> But Paul is a smart, methodical, experienced, and ultimately practical
guy.
>
> So I suggest that he needs to be engaged in this discussion.
>
> _____
> [This mail represents part of a discussion of work in progress
> and should not be used for any purpose without my permission.]
> _____
> Simon.Cox@csiro.au  CSIRO Exploration & Mining
> 26 Dick Perry Avenue, Kensington WA 6151
> PO Box 1130, Bentley WA 6102  AUSTRALIA
> T: +61 (8) 6436 8639  F: +61 (8) 6436 8555  C: +61 (4) 0330 2672
> http://www.csiro.au/page.asp?type=resume&id=CoxSimon
>
                                                                                                      
                    Dave Case                                                                         
                    <dwc@mitre.org       To:     Paul Daisey <pdaisey@geo.census.gov>                 
                    >                    cc:                                                          
                                         Subject:     [Fwd: [Fwd: Schema Design: Composition vs       
                    04/03/2002            Subclassing]]                                               
                    02:11 PM                                                                          
                                                                                                      
                                                                                                      




Yet more info. for you!
Dave

-------- Original Message --------
Subject: [Fwd: Schema Design: Composition vs Subclassing]
Date: Wed, 03 Apr 2002 12:07:37 -0500
From: "Roger L. Costello" <costello@mitre.org>
Organization: The MITRE Corporation
To: "Howard,Diane M." <dhoward@mitre.org>,"Case,David W."
<dwc@mitre.org>, "Jones,Patrick L." <plj@mitre.org>
----- Message from Jeni Tennison <jeni@jenitennison.com> on Wed, 3 Apr 2002
10:06:21 +0100 -----
                                                             
      To: "Roger L. Costello" <costello@mitre.org>           
                                                             
      cc: Curt.Arnold@hyprotech.com, xmlschema-dev@w3.org,   
          Simon.Cox@csiro.au                                 
                                                             
 Subject: Re: Schema Design: Composition vs Subclassing      
                                                             

Hi Roger,

> It dawns on me that this is the old Object-Oriented issue of
> design-by-subclassing versus design-by-composition, now rearing its
> head in the design of XML Schemas. Let's consider these two design
> approaches as they apply to XML Schemas.
>
> Let's compare these two design approaches:
>  . design-by-subclassing (i.e., type hierarchies)
>       versus
>  . design-by-composition (i.e., bundling together element groups).

Just to add a few fairly random thoughts...

Design-by-composition is, of course, the approach that RELAX NG takes,
but does in a much more flexible way. I think that the most important
difference with RELAX NG, and something that really limits
design-by-composition in XML Schema, is that in XML Schema each group
can either contain a content model or attributes, but not both.
Imagine that you had a type:

<xs:complexType name="C1">
  <xs:sequence>
    <xs:element name="E1" .../>
    <xs:element name="E2" .../>
  </xs:sequence>
  <xs:attribute name="A1" .../>
</xs:complexType>

Using composition, you'd have to use two groups to replace the one
complex type:

<xs:group name="G1">
  <xs:sequence>
    <xs:element name="E1" .../>
    <xs:element name="E2" .../>
  </xs:sequence>
</xs:group>

<xs:attributeGroup name="AG1">
  <xs:attribute name="A1" .../>
</xs:attributeGroup>

If the content model and the attributes are conceptually linked,
splitting them up doesn't seem wise. It makes it easy for someone to
accidentally omit one or the other when both should always be present.
You don't have that problem with complex types.

The second (related) issue that came to mind was that you're quite
able to use groups in a highly coupled manner. A fairer approximation
of the complex types would be:

<xs:group name="G1">
  <xs:sequence>
    <xs:element name="E1" type="..."/>
    <xs:element name="E2" type="..."/>
    <xs:element name="E3" type="..."/>
  </xs:sequence>
</xs:complexType>

<xs:group name="G2">
  <xs:sequence>
    <xs:group ref="G1"/>
    <xs:element name="E4" type="..."/>
  </xs:sequence>
</xs:complexType>

<xs:group name="G3">
  <xs:sequence>
    <xs:group ref="G2"/>
    <xs:element name="E5" type="..."/>
    <xs:element name="E6" type="..."/>
  </xs:sequence>
</xs:complexType>

<xs:element name="root">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="G3"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

I think it's important to show that it's not the fact that you're
using groups that gives you the advantage, it's the *way* that you use
the groups. And it means that you have to consider at what level you
cluster elements together. Is the rule that you only create groups
with element particles?

I'll also note that whichever way you do it, you're going to end up
with having to understand three or four separate components -- the
only difference between composition and subclassing at this level is
whether they're arrange horizontally (composition) or vertically
(subclassing). And which of those you find easiest to understand and
work with probably comes down to personal taste.

Another thing is to consider the work of the schema designer in
creating the content models for the elements. I'd assume that if the
order of the elements should be E1, E2, E3, E4, ... for one element then it
should be E1, E2, E3, E4, ... for another element. With composition,
though, it's easy to accidentally change the ordering of the groups
that you use:

<xs:element name="root">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="G2"/>
      <xs:group ref="G3"/>
      <xs:group ref="G1"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

(Of course it's obvious here, but in real life you'd use real names
for the groups and there'd probably be more of them, so it would be
far easier to make this mistake.) So I think that using composition,
it's easier to create inconsistent content models, and inconsistency
is a headache for authors/developers who have to use the markup
language. On the other hand, some might view this as providing
flexibility compared to the rules about derivation by extension.

Finally, I think that design-by-subclassing has one killer advantage,
namely that applications can use information about the type hierarchy
in order to provide common processing for all elements of a particular
(high level) type. In your example, the root element has the types C3,
C2 and C1, and an application could use the fact that it knows that
the root element is of type C2 to know that it can process elements
E1, E2, E3 and E4 without having to know that it's also contains
elements E5 and E6. The application can also know that other elements
of type C2, whether they are of type C3 or some other type, can be
processed in the same way. On the other hand, information about the
model groups that have been composed to create a content model is lost
(I believe).

I think it's fair to say that we haven't seen this advantage in the
real world yet. There simply aren't parsers that make typing
information available. Possibly XPath/XSLT 2.0 and DOM 3.0 AS will
help here...

Hmm... those turned out to be generally pro design-by-subclassing --
perhaps that's just because I'm a natural devil's advocate ;) I
suppose if I'm thinking in object-oriented terms, I think of the type
hierarchy as being like the class hierarchy and groups as being like
interfaces.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/
Received on Friday, 5 April 2002 06:47:17 UTC