RE: Assertions in Schema 1.1 Part 1 from Michael Kay on 2006-09-27 (www-xml-schema-comments@w3.org from July to September 2006)

From: Michael Kay <mike@saxonica.com>
Date: Wed, 27 Sep 2006 18:22:42 +0100
To: "'Sandy Gao'" <sandygao@ca.ibm.com>
Cc: <w3c-xsl-query@w3.org>, <www-xml-schema-comments@w3.org>
Message-ID: <013d01c6e259$8a927aa0$6601a8c0@turtle>
Thanks for the response.
 
Obviously my comments were my own and were not intended to reflect the views
of any other QT members.
 
I think that subsetting XPath for use in assertions is a bad idea for four
reasons:
 
(a) it's confusing to users
 
(b) it removes functionality that users badly need
 
(c) it complicates the specs
 
(d) it reduces the reusability of implementations
 
I accept that there might (perhaps) be a case for restricting path
expressions to work within the tree rooted at the element being validated.
That can be achieved by defining the path expression to operate on a copy of
the subtree, it doesn't require any subsetting of the language.
 
I also accept that there is some scope for user confusion because the XPath
operators, such as equality, have different semantics from XML Schema
equivalents. But this problem already exists, because the same users are
already using both languages. It would be far worse if you can say @price>2
in "real" XPath, but can't say it in the XML Schema dialect of XPath, when
@price is a double. I would strongly suggest that the best way to solve this
is for XML Schema to move towards using the XPath operator definitions, just
as QT has adopted the XMLSchema-defined data types. Equality is being
changed in XML Schema anyway, so there's a good opportunity to align it.
 
I could just about live with subsetting XPath syntax, though I think it's a
bad idea, and I find this particular subset very unattractive. But an XPath
dialect with modified semantics is completely unacceptable, in my view.
 
PS: I'm sorry I missed the Redmond meeting. Unlike every other European
implementor of these specs, I'm still trying to participate in the process,
but without a full agenda for the meetings it's very hard to justify the
travel. I think W3C management needs to review the modus operandi. We need
to be more inclusive, and we need to cut our carbon emissions.
 
Michael Kay
http://www.saxonica.com


  _____  

From: Sandy Gao [mailto:sandygao@ca.ibm.com] 
Sent: 27 September 2006 03:48
To: Michael Kay
Cc: w3c-xsl-query@w3.org; www-xml-schema-comments@w3.org
Subject: Re: Assertions in Schema 1.1 Part 1



Michael, 

Thanks for the detailed comments. Many of them make good sense to me. 

There was a short joint session between the Schema WG and Query gurus during
the past F2F meeting in Redmond [member-only 1]. We quickly went through
this proposal and spent some time talking about the XPath subset. No
consensus was reached (there may never be one). What I do remember clearly
was that at the end QT members were asked the question "do you want to see
the restriction on the subset being relaxed?" For those who answered,
roughly half said "yes" and half "no". (This was a little surprising to me,
as I had thought that, being the inventor of XPath 2.0, most QT members will
have the same reaction to the subset as you do.) 

The discussion on the subset and whether/how to enlarge it is still under
discussion in the schema WG. See [2]. 

The way I see it, there are 2 reasons for defining a subset. 

The first one is as you observed that schema likes downward-looking paths,
both to make type validation not context-dependent and to support streaming
processors. This I believe is a hard requirement for most/all schema WG
members. 

The other reason is to avoid using operations not defined/available in
schema. For example, not to support arithmetics (because schema only defines
value spaces and comparisons) and type promotion (schema makes it very clear
that float 1.0 and decimal 1.0 are not connected at all). 

Whether either of this makes sense is a judgement call. I don't think there
is a single *right* answer. 

Now some specific points you made... 

> It will be terribly confusing to users, 

I agree. It's always a trade-off. 

> and it makes life difficult for implementors, most of whom will
> already have access to a fully-functional XPath 2.0 engine.

Not sure I agree with this one. First, I think there is a long way between
now and the day when "everyone has access to an XPath 2.0 engine". Second,
even when things like JAXP provide such generally-available XPath engine,
schema processors may still choose *not* to use it, for various reasons,
including performance. (Does Saxon use the schema support available in JDK?
:p) And lastly, there are users who just want a schema processor that works
in the schema way (and want float 1.0 to be different from decimal 1.0). 

> I'm not sure whether it's useful for users to be forced to move the
> integrity constraint to the root element of the relevant tree, rather than
> defining it where it comes naturally.)

The question of "what's natural" is often subjective. Having said that, I do
agree that the subset Schema is current presenting does force users to put
constraints on the parent, which isn't always natural to all users. 

> The chosen subset seems to eliminate many useful integrity constraints. 

Different people may have different takes on this, but my personal feeling
is that schema "assertions" is not intended to replace or even compete with
current usage of other languages that support co-constraints (e.g.
Schematron). It's not a surprise if it can't handle all possible integrity
constraints one may want to enforce. 

On the particular example you mentioned, I'm wondering whether it can be
satisfied by slightly relaxing the subset (but still not allowing
"preceding-sibling"). For example, using a QuantifiedExpr with some numeric
predicates ... 

> Allowing implementors to provide a fuller subset of XPath doesn't solve
the
> problem for many users ...

Completely agree. This is something the schema WG has to consider before
making a final decision on how much freedom processors have in choosing
which subset to support. 

> I can't see why the Schema WG would want to define its own lexical rules
for
> XPath parsing that differ from those in the XPath 2.0 spec.

Not sure what you are referring to. The BNF? 

> The specification needs to make it clear whether the XPath expression is
> applied to a data model constructed from the pre-validation infoset or
from
> the post-validation PSVI. In other words, are the nodes accessed by the
path
> expression typed or untyped?

Typed. It should be as if the schema validation is finished, PSVI is
produced, XDM is constructed, then XPaths are evaluated. Agree that this
needs to be clarified. 

> There appears to be an attempt to make the path expression error-free by
> saying that non-comparable values are treated as not equal. 

My understanding of XPath 2.0 is that if I try to compare float 1.0 and
decimal 1.0, the result is "equal". But the goal (whether it's the right one
can be debated) is to make it follow the schema comparison rules and mark
them as "not equal". 

> The XPath specification defines a static and dynamic context (see section
> 2.1) which define the interface between XPath and its host language. The
> schema spec needs to state how each value in the static and dynamic
context
> is initialized.

My limited XPath knowledge doesn't help much here to understand the comment,
but I trust you are right. 

> The actual grammar proposed seems to have bugs, for example in BooleanExpr


Yup. Noticed it too and fixed it on my/our internal copy. 

> Overally the
> grammar appears incredibly ugly, for example having different rules for
the
> two operands of "eq", 

The grammar was cooked up quickly to match the subset. A lot of the ugliness
come from the (debatable) need to *avoid* type conversion/promotion. 

> and different rules depending on whether the "eq" is
> inside a predicate or not. 

This again was introduced to match the subset. As mentioned in [2], there is
possibility (no agreement yet) that such difference will be removed. 


Again, thanks for the thoughtful comments. I'm sure the schema WG will be
spending time looking at assertions before finalizing the design. 

[1] http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2006Aug/0021
[2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=3673 

Thanks,
Sandy Gao
XML Parser Development, IBM Canada
(1-905) 413-3255
sandygao@ca.ibm.com


www-xml-schema-comments-request@w3.org wrote on 2006-09-26 02:27:38 PM:

> 
> 
> I think the part of the Schema 1.1 draft that everyone on QT needs to read
> is section 3.12 on Assertions. It would probably also be a good idea to
> schedule a presentation by someone from the Schema WG who knows the
> rationale for the decisions that were made, and can provide some of the
> background - preferably someone who is resilient to hecklers.
> 
> I can't personally see the rationale for using a subset of XPath 2.0 here
> rather than allowing the full language. It will be terribly confusing to
> users, and it makes life difficult for implementors, most of whom will
> already have access to a fully-functional XPath 2.0 engine. There aren't
any
> obvious performance benefits in most of the restrictions; in fact, I can't
> see any benefits at all. 
> 
> I can see why the Schema WG might want to restrict the path expression to
> access only the tree rooted at the node being validated, because
> traditionally the validity of an element depends only on its content and
not
> on its context; however, that could more easily be achieved by defining
the
> path expression to operate on a deep copy of the element. (But personally,
> I'm not sure whether it's useful for users to be forced to move the
> integrity constraint to the root element of the relevant tree, rather than
> defining it where it comes naturally.)
> 
> The chosen subset seems to eliminate many useful integrity constraints. To
> take one arbitrary example that I came across recently, there is no way to
> say that in a sequence of sibling X elements, the value of @Y is
> monotonically increasing. (That is, <xs:assert
> test="not(preceding-sibling::X/@Y gt @Y)"/>). Users are going to be very
> disappointed by these restrictions. 
> 
> Allowing implementors to provide a fuller subset of XPath doesn't solve
the
> problem for many users (such as groups writing schema standards for an
> industry), who have to avoid reliance on optional features. (It's worth
> observing here that XQuery made many of the axes optional, but
implementors
> have nearly all chosen to provide them, simply because users need them. I
> also seem to recall that for a long time SQL resisted allowing any SQL
> expression to be used in an integrity constraint; eventually they were
> forced to relent.)
> 
> I can also see why the Schema WG might want to disallow use of functions
> whose result is context-dependent, such as current-date() or doc().
> Nevertheless, these functions provide validation capabilities that XML
> Schema users are crying out for.
> 
> I can't see why the Schema WG would want to define its own lexical rules
for
> XPath parsing that differ from those in the XPath 2.0 spec.
> 
> The specification needs to make it clear whether the XPath expression is
> applied to a data model constructed from the pre-validation infoset or
from
> the post-validation PSVI. In other words, are the nodes accessed by the
path
> expression typed or untyped?
> 
> There appears to be an attempt to make the path expression error-free by
> saying that non-comparable values are treated as not equal. Modifying the
> XPath semantics in this way seems the wrong thing to do. If this effect is
> required, the best way to handle it is to say that any dynamic error that
> occurs during the XPath evaluation causes the result of the entire
> expression to be treated as false.
> 
> The XPath specification defines a static and dynamic context (see section
> 2.1) which define the interface between XPath and its host language. The
> schema spec needs to state how each value in the static and dynamic
context
> is initialized.
> 
> The actual grammar proposed seems to have bugs, for example in BooleanExpr
> (and PredicateBoolean) I think there's a missing vertical bar. Overally
the
> grammar appears incredibly ugly, for example having different rules for
the
> two operands of "eq", and different rules depending on whether the "eq" is
> inside a predicate or not. I thought the battle for orthogonality in
> language design had been won about 40 years ago, I was clearly mistaken.
> 
> 
> Michael Kay
> Saxonica Limited
Received on Wednesday, 27 September 2006 17:23:01 UTC