- From: <paul.w.daisey@census.gov>
- Date: Thu, 4 Apr 2002 14:46:51 -0500 (EST)
- To: costello@mitre.org
- Cc: Simon.Cox@csiro.au, jeni@jenitennison.com, dwc@mitre.org, plj@mitre.org, dhoward@mitre.org, xmlschema-dev@w3.org, gml30.rwg@opengis.org
All: Simon and Dave have invited me to join your discussion. Roger: In general terms, I think the nature of one's problem space should be a major consideration in deciding whether to use deep, rigid type inheritance hierarchies, or shallow, flexible type hierarchies defined mostly by composition. Some problem spaces, like both ISO 19107 Geometry / Topology definitions, and Census geography, have traditionally had both very rigid and inflexible definitions, and types that lend themselves to hierarchical definition. I thought Jeni Tennison's comments on potential disadvanges of definition by composition were quite thorough. The "killer" advantage of design by subclassing was the one I was going to raise, that it explicitly retains type inheritance identification; E isa D isa C isa B isa A, in the extension / restriction base="" attribute. This is an advantage not just in XML, but also in using XML/Schemas as the quasi-MDA base for object definitions in languages like Java, for example with a product like Castor, whereby the type inheritance identification is passed to the generated language classes. I'm not sure I entirely agree with the conclusion: " information about the model groups that have been composed to create a content model is lost(I believe)". I think that it is possible to write xslt to determine the implicit type inheritance hierarchies in XML/Schema types defined by composition by comparative inventory of their contents. But in any case the type inheritance information is certainly less accessible than in the design by subclassing case. In designing GML, at several points the analogy to object type libraries was raised as a design and usage model, although I don't remember if such a reference made it into the GML v2 document. Your points about the disadvantages of incompatabilies in derived types caused by changes in base types is well taken by those of us who have been trying to retain backward compatability as we move to GML v3. I guess I'd argue that those difficulties are worth wrestling with so that the task of developing general tools for processing and transforming GML is facilitated by type inheritance information in the <Class> and <property> hierarchies in GML. Such tools can then, for instance, deal with a tgr:TigerLineString as a gml:LineString and ignore its added elements, making them useful for documents conforming to a variety of application-specific schemas that extend and restrict GML types. Or as Jon Udell said in "Java, XML, and Web Services" in the March 25 InfoWorld, "The more XML messages say about themselves, the less their senders and receivers need to know about one another's infrastructures." Although I frequently disagree with your conclusions as to what constitutes "best practice", I've learned a lot from what you've written, think you are performing a valuable service by examining these issues, and encourage you to keep it up. Best regards, Paul Paul.W.Daisey@census.gov U.S. Census Bureau phone: (301) Geography Division 457-4308 fax: (301) www.census.gov/geo/www/ 457-4710 ----- Forwarded by Paul W Daisey/GEO/HQ/BOC on 04/04/2002 01:55 PM ----- Simon.Cox@csiro.au wrote: > > Roger - I think I agree with where you are going here. > An additional advantage is that design-by-composition gives a kind of weak > multiple-inheritance method, which can be very useful. > > From time to time I have tried to gently push the GML crew in this > direction. > For example, I was responsible for all the <group> and <attributeGroup>s in > GML - not many but they do lurk in 3 (?) places (associationAttributeGroup, > locator, dynamicProperties). > And if you can be bothered to look into them, the modified schema docs I > sent you yesterday did, in fact, replace a bunch of types defined through > derivation by restriction, with types defined "fresh", by composition, but > following a "pattern". > This was mainly in the "property" part of the dual hierarchy. > The type derivation in the "Object" part is much less troublesome and > contentious. > > One of the constraints in GML is a requirement - strong from some > stakeholders, but not all - to follow UML models from the ISO 191XX series - > in particular ISO 19107 which is a complex geometry model. > But the market wants XML solutions. > So we have tried to reproduce the complete object model in XML, with all the > inheritance hierarchies realised as XML type-derivation hierarchies. > This MDA tendency is seductive to people who are geographers first and > analysts second, but with a attraction to analysis. > > There is some resistance to "flattening" the schemas. > Paul Daisey is probably the person with the most coherent use-case for > retaining them, though I'm not sure I fully understand his argument. > But Paul is a smart, methodical, experienced, and ultimately practical guy. > > So I suggest that he needs to be engaged in this discussion. > > _____ > [This mail represents part of a discussion of work in progress > and should not be used for any purpose without my permission.] > _____ > Simon.Cox@csiro.au CSIRO Exploration & Mining > 26 Dick Perry Avenue, Kensington WA 6151 > PO Box 1130, Bentley WA 6102 AUSTRALIA > T: +61 (8) 6436 8639 F: +61 (8) 6436 8555 C: +61 (4) 0330 2672 > http://www.csiro.au/page.asp?type=resume&id=CoxSimon > Dave Case <dwc@mitre.org To: Paul Daisey <pdaisey@geo.census.gov> > cc: Subject: [Fwd: [Fwd: Schema Design: Composition vs 04/03/2002 Subclassing]] 02:11 PM Yet more info. for you! Dave -------- Original Message -------- Subject: [Fwd: Schema Design: Composition vs Subclassing] Date: Wed, 03 Apr 2002 12:07:37 -0500 From: "Roger L. Costello" <costello@mitre.org> Organization: The MITRE Corporation To: "Howard,Diane M." <dhoward@mitre.org>,"Case,David W." <dwc@mitre.org>, "Jones,Patrick L." <plj@mitre.org> ----- Message from Jeni Tennison <jeni@jenitennison.com> on Wed, 3 Apr 2002 10:06:21 +0100 ----- To: "Roger L. Costello" <costello@mitre.org> cc: Curt.Arnold@hyprotech.com, xmlschema-dev@w3.org, Simon.Cox@csiro.au Subject: Re: Schema Design: Composition vs Subclassing Hi Roger, > It dawns on me that this is the old Object-Oriented issue of > design-by-subclassing versus design-by-composition, now rearing its > head in the design of XML Schemas. Let's consider these two design > approaches as they apply to XML Schemas. > > Let's compare these two design approaches: > . design-by-subclassing (i.e., type hierarchies) > versus > . design-by-composition (i.e., bundling together element groups). Just to add a few fairly random thoughts... Design-by-composition is, of course, the approach that RELAX NG takes, but does in a much more flexible way. I think that the most important difference with RELAX NG, and something that really limits design-by-composition in XML Schema, is that in XML Schema each group can either contain a content model or attributes, but not both. Imagine that you had a type: <xs:complexType name="C1"> <xs:sequence> <xs:element name="E1" .../> <xs:element name="E2" .../> </xs:sequence> <xs:attribute name="A1" .../> </xs:complexType> Using composition, you'd have to use two groups to replace the one complex type: <xs:group name="G1"> <xs:sequence> <xs:element name="E1" .../> <xs:element name="E2" .../> </xs:sequence> </xs:group> <xs:attributeGroup name="AG1"> <xs:attribute name="A1" .../> </xs:attributeGroup> If the content model and the attributes are conceptually linked, splitting them up doesn't seem wise. It makes it easy for someone to accidentally omit one or the other when both should always be present. You don't have that problem with complex types. The second (related) issue that came to mind was that you're quite able to use groups in a highly coupled manner. A fairer approximation of the complex types would be: <xs:group name="G1"> <xs:sequence> <xs:element name="E1" type="..."/> <xs:element name="E2" type="..."/> <xs:element name="E3" type="..."/> </xs:sequence> </xs:complexType> <xs:group name="G2"> <xs:sequence> <xs:group ref="G1"/> <xs:element name="E4" type="..."/> </xs:sequence> </xs:complexType> <xs:group name="G3"> <xs:sequence> <xs:group ref="G2"/> <xs:element name="E5" type="..."/> <xs:element name="E6" type="..."/> </xs:sequence> </xs:complexType> <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:group ref="G3"/> </xs:sequence> </xs:complexType> </xs:element> I think it's important to show that it's not the fact that you're using groups that gives you the advantage, it's the *way* that you use the groups. And it means that you have to consider at what level you cluster elements together. Is the rule that you only create groups with element particles? I'll also note that whichever way you do it, you're going to end up with having to understand three or four separate components -- the only difference between composition and subclassing at this level is whether they're arrange horizontally (composition) or vertically (subclassing). And which of those you find easiest to understand and work with probably comes down to personal taste. Another thing is to consider the work of the schema designer in creating the content models for the elements. I'd assume that if the order of the elements should be E1, E2, E3, E4, ... for one element then it should be E1, E2, E3, E4, ... for another element. With composition, though, it's easy to accidentally change the ordering of the groups that you use: <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:group ref="G2"/> <xs:group ref="G3"/> <xs:group ref="G1"/> </xs:sequence> </xs:complexType> </xs:element> (Of course it's obvious here, but in real life you'd use real names for the groups and there'd probably be more of them, so it would be far easier to make this mistake.) So I think that using composition, it's easier to create inconsistent content models, and inconsistency is a headache for authors/developers who have to use the markup language. On the other hand, some might view this as providing flexibility compared to the rules about derivation by extension. Finally, I think that design-by-subclassing has one killer advantage, namely that applications can use information about the type hierarchy in order to provide common processing for all elements of a particular (high level) type. In your example, the root element has the types C3, C2 and C1, and an application could use the fact that it knows that the root element is of type C2 to know that it can process elements E1, E2, E3 and E4 without having to know that it's also contains elements E5 and E6. The application can also know that other elements of type C2, whether they are of type C3 or some other type, can be processed in the same way. On the other hand, information about the model groups that have been composed to create a content model is lost (I believe). I think it's fair to say that we haven't seen this advantage in the real world yet. There simply aren't parsers that make typing information available. Possibly XPath/XSLT 2.0 and DOM 3.0 AS will help here... Hmm... those turned out to be generally pro design-by-subclassing -- perhaps that's just because I'm a natural devil's advocate ;) I suppose if I'm thinking in object-oriented terms, I think of the type hierarchy as being like the class hierarchy and groups as being like interfaces. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/
Received on Friday, 5 April 2002 06:47:17 UTC