- From: Michael Kay <mike@saxonica.com>
- Date: Wed, 27 Sep 2006 18:22:42 +0100
- To: "'Sandy Gao'" <sandygao@ca.ibm.com>
- Cc: <w3c-xsl-query@w3.org>, <www-xml-schema-comments@w3.org>
- Message-ID: <013d01c6e259$8a927aa0$6601a8c0@turtle>
Thanks for the response. Obviously my comments were my own and were not intended to reflect the views of any other QT members. I think that subsetting XPath for use in assertions is a bad idea for four reasons: (a) it's confusing to users (b) it removes functionality that users badly need (c) it complicates the specs (d) it reduces the reusability of implementations I accept that there might (perhaps) be a case for restricting path expressions to work within the tree rooted at the element being validated. That can be achieved by defining the path expression to operate on a copy of the subtree, it doesn't require any subsetting of the language. I also accept that there is some scope for user confusion because the XPath operators, such as equality, have different semantics from XML Schema equivalents. But this problem already exists, because the same users are already using both languages. It would be far worse if you can say @price>2 in "real" XPath, but can't say it in the XML Schema dialect of XPath, when @price is a double. I would strongly suggest that the best way to solve this is for XML Schema to move towards using the XPath operator definitions, just as QT has adopted the XMLSchema-defined data types. Equality is being changed in XML Schema anyway, so there's a good opportunity to align it. I could just about live with subsetting XPath syntax, though I think it's a bad idea, and I find this particular subset very unattractive. But an XPath dialect with modified semantics is completely unacceptable, in my view. PS: I'm sorry I missed the Redmond meeting. Unlike every other European implementor of these specs, I'm still trying to participate in the process, but without a full agenda for the meetings it's very hard to justify the travel. I think W3C management needs to review the modus operandi. We need to be more inclusive, and we need to cut our carbon emissions. Michael Kay http://www.saxonica.com _____ From: Sandy Gao [mailto:sandygao@ca.ibm.com] Sent: 27 September 2006 03:48 To: Michael Kay Cc: w3c-xsl-query@w3.org; www-xml-schema-comments@w3.org Subject: Re: Assertions in Schema 1.1 Part 1 Michael, Thanks for the detailed comments. Many of them make good sense to me. There was a short joint session between the Schema WG and Query gurus during the past F2F meeting in Redmond [member-only 1]. We quickly went through this proposal and spent some time talking about the XPath subset. No consensus was reached (there may never be one). What I do remember clearly was that at the end QT members were asked the question "do you want to see the restriction on the subset being relaxed?" For those who answered, roughly half said "yes" and half "no". (This was a little surprising to me, as I had thought that, being the inventor of XPath 2.0, most QT members will have the same reaction to the subset as you do.) The discussion on the subset and whether/how to enlarge it is still under discussion in the schema WG. See [2]. The way I see it, there are 2 reasons for defining a subset. The first one is as you observed that schema likes downward-looking paths, both to make type validation not context-dependent and to support streaming processors. This I believe is a hard requirement for most/all schema WG members. The other reason is to avoid using operations not defined/available in schema. For example, not to support arithmetics (because schema only defines value spaces and comparisons) and type promotion (schema makes it very clear that float 1.0 and decimal 1.0 are not connected at all). Whether either of this makes sense is a judgement call. I don't think there is a single *right* answer. Now some specific points you made... > It will be terribly confusing to users, I agree. It's always a trade-off. > and it makes life difficult for implementors, most of whom will > already have access to a fully-functional XPath 2.0 engine. Not sure I agree with this one. First, I think there is a long way between now and the day when "everyone has access to an XPath 2.0 engine". Second, even when things like JAXP provide such generally-available XPath engine, schema processors may still choose *not* to use it, for various reasons, including performance. (Does Saxon use the schema support available in JDK? :p) And lastly, there are users who just want a schema processor that works in the schema way (and want float 1.0 to be different from decimal 1.0). > I'm not sure whether it's useful for users to be forced to move the > integrity constraint to the root element of the relevant tree, rather than > defining it where it comes naturally.) The question of "what's natural" is often subjective. Having said that, I do agree that the subset Schema is current presenting does force users to put constraints on the parent, which isn't always natural to all users. > The chosen subset seems to eliminate many useful integrity constraints. Different people may have different takes on this, but my personal feeling is that schema "assertions" is not intended to replace or even compete with current usage of other languages that support co-constraints (e.g. Schematron). It's not a surprise if it can't handle all possible integrity constraints one may want to enforce. On the particular example you mentioned, I'm wondering whether it can be satisfied by slightly relaxing the subset (but still not allowing "preceding-sibling"). For example, using a QuantifiedExpr with some numeric predicates ... > Allowing implementors to provide a fuller subset of XPath doesn't solve the > problem for many users ... Completely agree. This is something the schema WG has to consider before making a final decision on how much freedom processors have in choosing which subset to support. > I can't see why the Schema WG would want to define its own lexical rules for > XPath parsing that differ from those in the XPath 2.0 spec. Not sure what you are referring to. The BNF? > The specification needs to make it clear whether the XPath expression is > applied to a data model constructed from the pre-validation infoset or from > the post-validation PSVI. In other words, are the nodes accessed by the path > expression typed or untyped? Typed. It should be as if the schema validation is finished, PSVI is produced, XDM is constructed, then XPaths are evaluated. Agree that this needs to be clarified. > There appears to be an attempt to make the path expression error-free by > saying that non-comparable values are treated as not equal. My understanding of XPath 2.0 is that if I try to compare float 1.0 and decimal 1.0, the result is "equal". But the goal (whether it's the right one can be debated) is to make it follow the schema comparison rules and mark them as "not equal". > The XPath specification defines a static and dynamic context (see section > 2.1) which define the interface between XPath and its host language. The > schema spec needs to state how each value in the static and dynamic context > is initialized. My limited XPath knowledge doesn't help much here to understand the comment, but I trust you are right. > The actual grammar proposed seems to have bugs, for example in BooleanExpr Yup. Noticed it too and fixed it on my/our internal copy. > Overally the > grammar appears incredibly ugly, for example having different rules for the > two operands of "eq", The grammar was cooked up quickly to match the subset. A lot of the ugliness come from the (debatable) need to *avoid* type conversion/promotion. > and different rules depending on whether the "eq" is > inside a predicate or not. This again was introduced to match the subset. As mentioned in [2], there is possibility (no agreement yet) that such difference will be removed. Again, thanks for the thoughtful comments. I'm sure the schema WG will be spending time looking at assertions before finalizing the design. [1] http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2006Aug/0021 [2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=3673 Thanks, Sandy Gao XML Parser Development, IBM Canada (1-905) 413-3255 sandygao@ca.ibm.com www-xml-schema-comments-request@w3.org wrote on 2006-09-26 02:27:38 PM: > > > I think the part of the Schema 1.1 draft that everyone on QT needs to read > is section 3.12 on Assertions. It would probably also be a good idea to > schedule a presentation by someone from the Schema WG who knows the > rationale for the decisions that were made, and can provide some of the > background - preferably someone who is resilient to hecklers. > > I can't personally see the rationale for using a subset of XPath 2.0 here > rather than allowing the full language. It will be terribly confusing to > users, and it makes life difficult for implementors, most of whom will > already have access to a fully-functional XPath 2.0 engine. There aren't any > obvious performance benefits in most of the restrictions; in fact, I can't > see any benefits at all. > > I can see why the Schema WG might want to restrict the path expression to > access only the tree rooted at the node being validated, because > traditionally the validity of an element depends only on its content and not > on its context; however, that could more easily be achieved by defining the > path expression to operate on a deep copy of the element. (But personally, > I'm not sure whether it's useful for users to be forced to move the > integrity constraint to the root element of the relevant tree, rather than > defining it where it comes naturally.) > > The chosen subset seems to eliminate many useful integrity constraints. To > take one arbitrary example that I came across recently, there is no way to > say that in a sequence of sibling X elements, the value of @Y is > monotonically increasing. (That is, <xs:assert > test="not(preceding-sibling::X/@Y gt @Y)"/>). Users are going to be very > disappointed by these restrictions. > > Allowing implementors to provide a fuller subset of XPath doesn't solve the > problem for many users (such as groups writing schema standards for an > industry), who have to avoid reliance on optional features. (It's worth > observing here that XQuery made many of the axes optional, but implementors > have nearly all chosen to provide them, simply because users need them. I > also seem to recall that for a long time SQL resisted allowing any SQL > expression to be used in an integrity constraint; eventually they were > forced to relent.) > > I can also see why the Schema WG might want to disallow use of functions > whose result is context-dependent, such as current-date() or doc(). > Nevertheless, these functions provide validation capabilities that XML > Schema users are crying out for. > > I can't see why the Schema WG would want to define its own lexical rules for > XPath parsing that differ from those in the XPath 2.0 spec. > > The specification needs to make it clear whether the XPath expression is > applied to a data model constructed from the pre-validation infoset or from > the post-validation PSVI. In other words, are the nodes accessed by the path > expression typed or untyped? > > There appears to be an attempt to make the path expression error-free by > saying that non-comparable values are treated as not equal. Modifying the > XPath semantics in this way seems the wrong thing to do. If this effect is > required, the best way to handle it is to say that any dynamic error that > occurs during the XPath evaluation causes the result of the entire > expression to be treated as false. > > The XPath specification defines a static and dynamic context (see section > 2.1) which define the interface between XPath and its host language. The > schema spec needs to state how each value in the static and dynamic context > is initialized. > > The actual grammar proposed seems to have bugs, for example in BooleanExpr > (and PredicateBoolean) I think there's a missing vertical bar. Overally the > grammar appears incredibly ugly, for example having different rules for the > two operands of "eq", and different rules depending on whether the "eq" is > inside a predicate or not. I thought the battle for orthogonality in > language design had been won about 40 years ago, I was clearly mistaken. > > > Michael Kay > Saxonica Limited
Received on Wednesday, 27 September 2006 17:23:01 UTC