RE: Classroom critique of XML Schema 1.1

Hi Michael,

Thank you for your clear and thorough explanations. If you don't mind, I would like to dig a bit deeper and challenge the decisions of the WG.

Regarding this recommendation:

>> Recommendation: Lift the restriction that the XPath expression in <assert> 
>> elements can only "look down." Permit the XPath expression to look anywhere, 
>> including to other XML documents. 

You wrote:

> In XSD 1.0 and in XSD 1.1 apart from assertions, type validity has
> the property that it depends upon an element and a type, and does
> not depend at all upon the context of the element.  So it's always possible
> to validate an element against a type by extracting the element from 
> the document and handing it to a validator.

> This context-free nature of type validity was an important design
> goal for some members of the WG, though not for all.

Hold on! XML Schema 1.1 elements are not context-free. 

Example: in my class I gave this example: What is the data type of this <Publisher> element:

    <element name="Publisher" type="string" />

Without considering context, you would answer "string." However, if you look up the tree to the <BarnesAndNoble> ancestor element you see that it contains an <assert> element which constrains the string data type:

    <assert test="string-length(.//Publisher le 140)" />

"The length of the value of the <Publisher> element must be less than 140 characters."

Thus the element declaration for Publisher is not context-free. Its type is very much dependent on its context.

In fact the phrase "action at a distance" was coined (I think by you Michael!) to emphasize the non-context-free nature of XML Schema 1.1 element and attribute declarations. 

You also wrote:

> It is also the case that pretty much every example the WG has
> encountered where it felt natural to express a constraint by looking
> up could also be expressed in a downward-looking assertion on an
> ancestor element.

I remember having a debate with Noah regarding, "Should <assert> elements be positioned right where they are needed or higher up the tree?" Noah argued convincingly for positioning <assert> elements right where they are needed. My class agreed 100% with Noah. One excellent rationale given by a student is "co-locating <assert> elements enables extractability and reusability of components." 

Example: above I placed the <assert> element on the <BarnesAndNoble> element, which is way up the ancestor chain from the Publisher element declaration. If the element declaration for Publisher is reused by another schema then the constraints provided by the <assert> element is lost. 

The example illustrates an important point, so let's pursue it further. Suppose we lift the restriction that the <assert> element's XPath expression can only "look down." And suppose the value of the <Publisher> element should not be longer than 140 characters if its context is a <BarnesAndNoble> ancestor, it should not be longer than 70 characters if its context is an <Amazon> ancestor, and otherwise any string is permitted. This is naturally expressed like this:

<element name="Publisher">
    <complexType>
        <simpleContent>
            <extension base="string">
               <assert test=" if (ancestor::BarnesAndNoble) then 
                                 string-length(text()) le 140 
                              else if (ancestor::Amazon) then 
                                 string-length(text()) le 70 
                              else true()" />
            </extension>
        </simpleContent>
    </complexType>
</element>

This is a nice reusable, self-contained component. It specifies the constraints right where they are needed, not in a remote ancestor element.

Conversely, if the "look down" restriction is not lifted then we must redesign and convert BarnesAndNoble and Amazon into attribute values. That is, change from this design:

    <BarnesAndNoble>
        <Book>
            ...
            <Publisher>...</Publisher>
        </Book>
        ...
    </BarnesAndNoble>

to this design:

    <Store storename="BarnesAndNoble">
        <Book>
            ...
            <Publisher>...</Publisher>
        </Book>
        ...
    </Store>

Where the storename attribute is declared to be inheritable:

    <attribute name="storename" inheritable="true" />

Now the Publisher element may be constrained by using the inherited storename attribute:

<element name="Publisher">
    <complexType>
        <simpleContent>
            <extension base="string">
               <assert test=" if (ancestor::*[@storename='BarnesAndNoble']) then 
                                 string-length(text()) le 140 
                              else if (ancestor:: *[@storename='Amazon']) then 
                                 string-length(text()) le 70 
                              else true()" />
            </extension>
        </simpleContent>
    </complexType>
</element>

This is very poor design for the following reasons:

1.	There may be multiple ancestor elements with storename attributes, only the closest one is visible.
2.	The XML document was redesigned to get around the limitations of XML Schema 1.1. Rather than sticking to the role of just validating XML designs, XML Schema 1.1 moves into the role of dictating XML designs.

Let's recap some key points:

A.	There is universal agreement that <assert> constraints should be positioned right with the elements they apply to.
B.	The "look down" restriction on <assert> can be circumvented by making heavy use of inheritable attributes in ancestor elements.
 
Prediction: If the "look down" restriction on <assert> is not lifted then there will be the following unanticipated consequence: XML Schema designers will declare lots of attributes and make them all inheritable. Thus, if an item is rightfully expressed as an element, it will instead be expressed as an attribute. This will result in poor XML designs, cognitive dissonance, and reduced interoperability.

You also wrote:

> The really decisive point is that the type system and formal semantics 
> of XPath 2.0, XSLT 2.0, and XQuery 1.0 depend (I am told) on the
> property that for any element E of type T, if you put element E
> into some context, it will still have type T, regardless of what context
> you put it into.   But if type validity were context-dependent, putting
> E into an unsuitable context might easily render it invalid -- which
> means that it would no longer be of type T.

I don't understand. 

In XPath, I can establish a context node anywhere in the tree and navigate along any axis, including the ancestor axis. Furthermore, any " data type" information that XPath, XSLT 2.0, or XQuery gets is from XML Schema, so it is XML Schema that is driving this data type train, not XPath, XSLT 2.0, or XQuery. 

/Roger

Received on Saturday, 19 March 2011 13:45:22 UTC