Assertions in Schema 1.1 Part 1

I think the part of the Schema 1.1 draft that everyone on QT needs to read
is section 3.12 on Assertions. It would probably also be a good idea to
schedule a presentation by someone from the Schema WG who knows the
rationale for the decisions that were made, and can provide some of the
background - preferably someone who is resilient to hecklers.

I can't personally see the rationale for using a subset of XPath 2.0 here
rather than allowing the full language. It will be terribly confusing to
users, and it makes life difficult for implementors, most of whom will
already have access to a fully-functional XPath 2.0 engine. There aren't any
obvious performance benefits in most of the restrictions; in fact, I can't
see any benefits at all. 

I can see why the Schema WG might want to restrict the path expression to
access only the tree rooted at the node being validated, because
traditionally the validity of an element depends only on its content and not
on its context; however, that could more easily be achieved by defining the
path expression to operate on a deep copy of the element. (But personally,
I'm not sure whether it's useful for users to be forced to move the
integrity constraint to the root element of the relevant tree, rather than
defining it where it comes naturally.)

The chosen subset seems to eliminate many useful integrity constraints. To
take one arbitrary example that I came across recently, there is no way to
say that in a sequence of sibling X elements, the value of @Y is
monotonically increasing. (That is, <xs:assert
test="not(preceding-sibling::X/@Y gt @Y)"/>). Users are going to be very
disappointed by these restrictions. 

Allowing implementors to provide a fuller subset of XPath doesn't solve the
problem for many users (such as groups writing schema standards for an
industry), who have to avoid reliance on optional features. (It's worth
observing here that XQuery made many of the axes optional, but implementors
have nearly all chosen to provide them, simply because users need them. I
also seem to recall that for a long time SQL resisted allowing any SQL
expression to be used in an integrity constraint; eventually they were
forced to relent.)

I can also see why the Schema WG might want to disallow use of functions
whose result is context-dependent, such as current-date() or doc().
Nevertheless, these functions provide validation capabilities that XML
Schema users are crying out for.

I can't see why the Schema WG would want to define its own lexical rules for
XPath parsing that differ from those in the XPath 2.0 spec.

The specification needs to make it clear whether the XPath expression is
applied to a data model constructed from the pre-validation infoset or from
the post-validation PSVI. In other words, are the nodes accessed by the path
expression typed or untyped?

There appears to be an attempt to make the path expression error-free by
saying that non-comparable values are treated as not equal. Modifying the
XPath semantics in this way seems the wrong thing to do. If this effect is
required, the best way to handle it is to say that any dynamic error that
occurs during the XPath evaluation causes the result of the entire
expression to be treated as false.

The XPath specification defines a static and dynamic context (see section
2.1) which define the interface between XPath and its host language. The
schema spec needs to state how each value in the static and dynamic context
is initialized.

The actual grammar proposed seems to have bugs, for example in BooleanExpr
(and PredicateBoolean) I think there's a missing vertical bar. Overally the
grammar appears incredibly ugly, for example having different rules for the
two operands of "eq", and different rules depending on whether the "eq" is
inside a predicate or not. I thought the battle for orthogonality in
language design had been won about 40 years ago, I was clearly mistaken.


Michael Kay
Saxonica Limited

Received on Tuesday, 26 September 2006 18:28:56 UTC