Re: Assertions in Schema 1.1 Part 1 from Sandy Gao on 2006-09-27 (www-xml-schema-comments@w3.org from July to September 2006)

From: Sandy Gao <sandygao@ca.ibm.com>
Date: Tue, 26 Sep 2006 22:48:23 -0400
To: "Michael Kay" <mike@saxonica.com>
Cc: w3c-xsl-query@w3.org, www-xml-schema-comments@w3.org
Message-ID: <OFA9079975.8852E157-ON852571F6.000ACA33-852571F6.000F6ACF@ca.ibm.com>
Michael,

Thanks for the detailed comments. Many of them make good sense to me.

There was a short joint session between the Schema WG and Query gurus 
during the past F2F meeting in Redmond [member-only 1]. We quickly went 
through this proposal and spent some time talking about the XPath subset. 
No consensus was reached (there may never be one). What I do remember 
clearly was that at the end QT members were asked the question "do you 
want to see the restriction on the subset being relaxed?" For those who 
answered, roughly half said "yes" and half "no". (This was a little 
surprising to me, as I had thought that, being the inventor of XPath 2.0, 
most QT members will have the same reaction to the subset as you do.)

The discussion on the subset and whether/how to enlarge it is still under 
discussion in the schema WG. See [2].

The way I see it, there are 2 reasons for defining a subset.

The first one is as you observed that schema likes downward-looking paths, 
both to make type validation not context-dependent and to support 
streaming processors. This I believe is a hard requirement for most/all 
schema WG members.

The other reason is to avoid using operations not defined/available in 
schema. For example, not to support arithmetics (because schema only 
defines value spaces and comparisons) and type promotion (schema makes it 
very clear that float 1.0 and decimal 1.0 are not connected at all).

Whether either of this makes sense is a judgement call. I don't think 
there is a single *right* answer.

Now some specific points you made...

> It will be terribly confusing to users,

I agree. It's always a trade-off.

> and it makes life difficult for implementors, most of whom will
> already have access to a fully-functional XPath 2.0 engine.

Not sure I agree with this one. First, I think there is a long way between 
now and the day when "everyone has access to an XPath 2.0 engine". Second, 
even when things like JAXP provide such generally-available XPath engine, 
schema processors may still choose *not* to use it, for various reasons, 
including performance. (Does Saxon use the schema support available in 
JDK? :p) And lastly, there are users who just want a schema processor that 
works in the schema way (and want float 1.0 to be different from decimal 
1.0).

> I'm not sure whether it's useful for users to be forced to move the
> integrity constraint to the root element of the relevant tree, rather 
than
> defining it where it comes naturally.)

The question of "what's natural" is often subjective. Having said that, I 
do agree that the subset Schema is current presenting does force users to 
put constraints on the parent, which isn't always natural to all users.

> The chosen subset seems to eliminate many useful integrity constraints.

Different people may have different takes on this, but my personal feeling 
is that schema "assertions" is not intended to replace or even compete 
with current usage of other languages that support co-constraints (e.g. 
Schematron). It's not a surprise if it can't handle all possible integrity 
constraints one may want to enforce.

On the particular example you mentioned, I'm wondering whether it can be 
satisfied by slightly relaxing the subset (but still not allowing 
"preceding-sibling"). For example, using a QuantifiedExpr with some 
numeric predicates ...

> Allowing implementors to provide a fuller subset of XPath doesn't solve 
the
> problem for many users ...

Completely agree. This is something the schema WG has to consider before 
making a final decision on how much freedom processors have in choosing 
which subset to support.

> I can't see why the Schema WG would want to define its own lexical rules 
for
> XPath parsing that differ from those in the XPath 2.0 spec.

Not sure what you are referring to. The BNF?

> The specification needs to make it clear whether the XPath expression is
> applied to a data model constructed from the pre-validation infoset or 
from
> the post-validation PSVI. In other words, are the nodes accessed by the 
path
> expression typed or untyped?

Typed. It should be as if the schema validation is finished, PSVI is 
produced, XDM is constructed, then XPaths are evaluated. Agree that this 
needs to be clarified.

> There appears to be an attempt to make the path expression error-free by
> saying that non-comparable values are treated as not equal.

My understanding of XPath 2.0 is that if I try to compare float 1.0 and 
decimal 1.0, the result is "equal". But the goal (whether it's the right 
one can be debated) is to make it follow the schema comparison rules and 
mark them as "not equal".

> The XPath specification defines a static and dynamic context (see 
section
> 2.1) which define the interface between XPath and its host language. The
> schema spec needs to state how each value in the static and dynamic 
context
> is initialized.

My limited XPath knowledge doesn't help much here to understand the 
comment, but I trust you are right.

> The actual grammar proposed seems to have bugs, for example in 
BooleanExpr

Yup. Noticed it too and fixed it on my/our internal copy.

> Overally the
> grammar appears incredibly ugly, for example having different rules for 
the
> two operands of "eq",

The grammar was cooked up quickly to match the subset. A lot of the 
ugliness come from the (debatable) need to *avoid* type 
conversion/promotion.

> and different rules depending on whether the "eq" is
> inside a predicate or not.

This again was introduced to match the subset. As mentioned in [2], there 
is possibility (no agreement yet) that such difference will be removed.


Again, thanks for the thoughtful comments. I'm sure the schema WG will be 
spending time looking at assertions before finalizing the design.

[1] http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2006Aug/0021
[2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=3673

Thanks,
Sandy Gao
XML Parser Development, IBM Canada
(1-905) 413-3255
sandygao@ca.ibm.com


www-xml-schema-comments-request@w3.org wrote on 2006-09-26 02:27:38 PM:

> 
> 
> I think the part of the Schema 1.1 draft that everyone on QT needs to 
read
> is section 3.12 on Assertions. It would probably also be a good idea to
> schedule a presentation by someone from the Schema WG who knows the
> rationale for the decisions that were made, and can provide some of the
> background - preferably someone who is resilient to hecklers.
> 
> I can't personally see the rationale for using a subset of XPath 2.0 
here
> rather than allowing the full language. It will be terribly confusing to
> users, and it makes life difficult for implementors, most of whom will
> already have access to a fully-functional XPath 2.0 engine. There aren't 
any
> obvious performance benefits in most of the restrictions; in fact, I 
can't
> see any benefits at all. 
> 
> I can see why the Schema WG might want to restrict the path expression 
to
> access only the tree rooted at the node being validated, because
> traditionally the validity of an element depends only on its content and 
not
> on its context; however, that could more easily be achieved by defining 
the
> path expression to operate on a deep copy of the element. (But 
personally,
> I'm not sure whether it's useful for users to be forced to move the
> integrity constraint to the root element of the relevant tree, rather 
than
> defining it where it comes naturally.)
> 
> The chosen subset seems to eliminate many useful integrity constraints. 
To
> take one arbitrary example that I came across recently, there is no way 
to
> say that in a sequence of sibling X elements, the value of @Y is
> monotonically increasing. (That is, <xs:assert
> test="not(preceding-sibling::X/@Y gt @Y)"/>). Users are going to be very
> disappointed by these restrictions. 
> 
> Allowing implementors to provide a fuller subset of XPath doesn't solve 
the
> problem for many users (such as groups writing schema standards for an
> industry), who have to avoid reliance on optional features. (It's worth
> observing here that XQuery made many of the axes optional, but 
implementors
> have nearly all chosen to provide them, simply because users need them. 
I
> also seem to recall that for a long time SQL resisted allowing any SQL
> expression to be used in an integrity constraint; eventually they were
> forced to relent.)
> 
> I can also see why the Schema WG might want to disallow use of functions
> whose result is context-dependent, such as current-date() or doc().
> Nevertheless, these functions provide validation capabilities that XML
> Schema users are crying out for.
> 
> I can't see why the Schema WG would want to define its own lexical rules 
for
> XPath parsing that differ from those in the XPath 2.0 spec.
> 
> The specification needs to make it clear whether the XPath expression is
> applied to a data model constructed from the pre-validation infoset or 
from
> the post-validation PSVI. In other words, are the nodes accessed by the 
path
> expression typed or untyped?
> 
> There appears to be an attempt to make the path expression error-free by
> saying that non-comparable values are treated as not equal. Modifying 
the
> XPath semantics in this way seems the wrong thing to do. If this effect 
is
> required, the best way to handle it is to say that any dynamic error 
that
> occurs during the XPath evaluation causes the result of the entire
> expression to be treated as false.
> 
> The XPath specification defines a static and dynamic context (see 
section
> 2.1) which define the interface between XPath and its host language. The
> schema spec needs to state how each value in the static and dynamic 
context
> is initialized.
> 
> The actual grammar proposed seems to have bugs, for example in 
BooleanExpr
> (and PredicateBoolean) I think there's a missing vertical bar. Overally 
the
> grammar appears incredibly ugly, for example having different rules for 
the
> two operands of "eq", and different rules depending on whether the "eq" 
is
> inside a predicate or not. I thought the battle for orthogonality in
> language design had been won about 40 years ago, I was clearly mistaken.
> 
> 
> Michael Kay
> Saxonica Limited
Received on Wednesday, 27 September 2006 02:48:42 UTC