- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Wed, 3 Apr 2002 10:06:21 +0100
- To: "Roger L. Costello" <costello@mitre.org>
- CC: Curt.Arnold@hyprotech.com, xmlschema-dev@w3.org, Simon.Cox@csiro.au
Hi Roger, > It dawns on me that this is the old Object-Oriented issue of > design-by-subclassing versus design-by-composition, now rearing its > head in the design of XML Schemas. Let's consider these two design > approaches as they apply to XML Schemas. > > Let's compare these two design approaches: > . design-by-subclassing (i.e., type hierarchies) > versus > . design-by-composition (i.e., bundling together element groups). Just to add a few fairly random thoughts... Design-by-composition is, of course, the approach that RELAX NG takes, but does in a much more flexible way. I think that the most important difference with RELAX NG, and something that really limits design-by-composition in XML Schema, is that in XML Schema each group can either contain a content model or attributes, but not both. Imagine that you had a type: <xs:complexType name="C1"> <xs:sequence> <xs:element name="E1" .../> <xs:element name="E2" .../> </xs:sequence> <xs:attribute name="A1" .../> </xs:complexType> Using composition, you'd have to use two groups to replace the one complex type: <xs:group name="G1"> <xs:sequence> <xs:element name="E1" .../> <xs:element name="E2" .../> </xs:sequence> </xs:group> <xs:attributeGroup name="AG1"> <xs:attribute name="A1" .../> </xs:attributeGroup> If the content model and the attributes are conceptually linked, splitting them up doesn't seem wise. It makes it easy for someone to accidentally omit one or the other when both should always be present. You don't have that problem with complex types. The second (related) issue that came to mind was that you're quite able to use groups in a highly coupled manner. A fairer approximation of the complex types would be: <xs:group name="G1"> <xs:sequence> <xs:element name="E1" type="..."/> <xs:element name="E2" type="..."/> <xs:element name="E3" type="..."/> </xs:sequence> </xs:complexType> <xs:group name="G2"> <xs:sequence> <xs:group ref="G1"/> <xs:element name="E4" type="..."/> </xs:sequence> </xs:complexType> <xs:group name="G3"> <xs:sequence> <xs:group ref="G2"/> <xs:element name="E5" type="..."/> <xs:element name="E6" type="..."/> </xs:sequence> </xs:complexType> <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:group ref="G3"/> </xs:sequence> </xs:complexType> </xs:element> I think it's important to show that it's not the fact that you're using groups that gives you the advantage, it's the *way* that you use the groups. And it means that you have to consider at what level you cluster elements together. Is the rule that you only create groups with element particles? I'll also note that whichever way you do it, you're going to end up with having to understand three or four separate components -- the only difference between composition and subclassing at this level is whether they're arrange horizontally (composition) or vertically (subclassing). And which of those you find easiest to understand and work with probably comes down to personal taste. Another thing is to consider the work of the schema designer in creating the content models for the elements. I'd assume that if the order of the elements should be E1, E2, E3, E4, ... for one element then it should be E1, E2, E3, E4, ... for another element. With composition, though, it's easy to accidentally change the ordering of the groups that you use: <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:group ref="G2"/> <xs:group ref="G3"/> <xs:group ref="G1"/> </xs:sequence> </xs:complexType> </xs:element> (Of course it's obvious here, but in real life you'd use real names for the groups and there'd probably be more of them, so it would be far easier to make this mistake.) So I think that using composition, it's easier to create inconsistent content models, and inconsistency is a headache for authors/developers who have to use the markup language. On the other hand, some might view this as providing flexibility compared to the rules about derivation by extension. Finally, I think that design-by-subclassing has one killer advantage, namely that applications can use information about the type hierarchy in order to provide common processing for all elements of a particular (high level) type. In your example, the root element has the types C3, C2 and C1, and an application could use the fact that it knows that the root element is of type C2 to know that it can process elements E1, E2, E3 and E4 without having to know that it's also contains elements E5 and E6. The application can also know that other elements of type C2, whether they are of type C3 or some other type, can be processed in the same way. On the other hand, information about the model groups that have been composed to create a content model is lost (I believe). I think it's fair to say that we haven't seen this advantage in the real world yet. There simply aren't parsers that make typing information available. Possibly XPath/XSLT 2.0 and DOM 3.0 AS will help here... Hmm... those turned out to be generally pro design-by-subclassing -- perhaps that's just because I'm a natural devil's advocate ;) I suppose if I'm thinking in object-oriented terms, I think of the type hierarchy as being like the class hierarchy and groups as being like interfaces. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/
Received on Wednesday, 3 April 2002 04:06:27 UTC