- From: Daniel Barclay <Daniel.Barclay@digitalfocus.com>
- Date: Fri, 06 Apr 2001 17:39:15 -0400
- To: www-xpath-comments@w3.org
- CC: jjc@jclark.com, Steven_DeRose@Brown.edu
SUMMARY: There seems to be an error in the XPath specification. The specification does not define which node or nodes are to be used as the context node for evaluating the first Step in a RelativeLocationPath. This means that the evaluation of an expression like "$x/self::y" is not defined. The specific problem appears to be that the definition of how to combine a _following_ step in a relative location path with preceding steps also needs to be applied in some form to the _first_ step, to combine the relative path expression with a preceding filter expression. (This could be an editing problem if FilterExpr, which can appear before a slash and then Steps in a RelativeLocationPath, was broken out from Step, which also can appear before a slash and then further Steps in a RelativeLocationPath.) Additionally, there is significant ambiguity in the wording. THE PROBLEM: First consider parsing "$x/self::y" as an expression (the Expr non-terminal). That string matches these productions: Expr ::= OrExpr OrExpr ::= ... ... ::= PathExpr PathExpr ::= FilterExpr '/' RelativeLocationPath Therefore, we have a path expression (except that that term is never used in the prose) of the third form (that last production), whose filter expression is the string "$x" and whose relative location path is "self::y". That filter expression is clearly a primary expression that is a variable reference. The relative location path of "self::y" clearly matches the first production for RelativeLocationPath (RelativeLocationPath ::= Step), so it's a relative location path having only one step. Now consider the rules for evaluating a relative location path (from section 2, just before section 2.1 (a couple of paragraphs above http://www.w3.org/TR/xpath#NT-RelativeLocationPath)): A relative location path consists of a sequence of one or more location steps separated by /. The steps in a relative location path are composed together from left to right. Each step in turn selects a set of nodes relative to a context node. An initial sequence of steps is composed together with a following step as follows. The initial sequence of steps selects a set of nodes relative to a context node. Each node in that set is used as a context node for the following step. The sets of nodes identified by that step are unioned together. The set of nodes identified by the composition of the steps is this union. ... Specifically, note that the specification says: Each step in turn selects a set of nodes relative to _a_ context node. The node or nodes to use as that context node is defined for any step after the first step: The initial sequence of steps selects a set of nodes relative to a context node. Each node in that set is used as a context node for the following step. However, nothing defines what to use as the context node when evaluating the _first_ step in a relative location path (or how many times it is evaluated). Also, nothing says that the value returned by the variable reference (e.g., a node set) is used in the relative location path. (It is supposed to be used, right?) MORE ANALYSIS: One problem is that nothing says that (sometimes) the first step in a relative location path is evaluated like a following step, evaluated once for each node in some node set and using that node as its context node. Another problem is that no wording seems to deal with the fact that a relative location path and its first step sometimes "receive" a node set from some other subexpression preceding it, for example, a filter expression. I think that the root of the wording problems is that RelativeLocationPath is used in several places in the grammar, but the definition of the semantics of a relative location path does not take those different contexts into account. (A relative location path appearing as a top-level expression is very different from a relative location appearing after a filter expression. In the first case, the first step of the relative location path uses the context node from...well...the context (e.g., XSLT). In the second case, the first step of the relative location path uses nodes from the node set from the filter expression. (Right?)) Additionally, note that there is a lot of ambiguity because some things aren't defined at all. Consider the non-terminal PathExpr. That would seem to map to "path expression" in the prose, but that term is never mentioned and no other term appears to be used. Since neither that term nor PathExpr itself is referred to the prose, there is _no_ definition of the semantics of any expression matching PathExpr. If the evaluation of an expression matching the production PathExpr ::= FilterExpr '/' RelativeLocationPath is supposed to take something from the evaluation of the FilterExpr and use it in the evaluation of the RelativeLocationPath, something in the text has to say so, or the semantics are unspecified. SOLUTIONS: At a minimum, I think that the specification needs to document: - that the first step in a relative location path is evaluated like a following steps, using each node in some node set as its context node - what that node set is in the different cases of relative location path (e.g., the one context node when it's an Expr, the FilterExpr's node set when it's a PathExpr, the root node when an AbsoluteLocationPath, etc.) However, I think it would likely be better to pull those differences out of relative location path's description and document them in the "calling" constructs. That is, define that a relative location path takes some expression value from the enclosing syntactic construct, and evaluates its steps starting with that expression value. Then, for each construct that uses RelativeLocationPath, define what value it passes to its subordinate RelativeLocationPath. For example, maybe (roughly, obviously): A relative location path takes a node set from the enclosing construct and ... For a LocationPath that is simply a RelativeLocationPath, evaluation consists of taking the context node (of the LocationPath), making a node set containing just that node, and evaluating the RelativeLocationPath given that node set. For an LocationPath that is an AbsoluteLocationPath, evaluation consists of evaluating the AbsoluteLocationPath. For an AbsoluteLocationPath of the form "'/' RelativeLocationPath", evaluation consists of taking the root node (of the document containing the context node), making a node set containing just that node, and evaluating the RelativeLocationPath given that node set. For a PathExpr of the form "FilterExpr '/' RelativeLocationPath", evaluation consists of evaluating the FilterExpr, and then evaluating the RelativeLocationPath given the node set from the FilterExpr. <other cases> Note that processing the first step in a relative location path (the Step in "RelativeLocationPath ::= Step") would now be the same as processing a following step (the Step in"RelativeLocationPath ::= RelativeLocationPath '/' Step"). More generally, the wording about expression evaluation should probably be regularized and made more complete (covering all syntactic constructs explicitly). Obviously, that could make the spec. much more wordy. However, being clear might be worth it. And maybe there's a compromise that isn't explicit for really obvious things but that still covers everything needed. (Actually, saying something like this might cover a lot: For any construct that consists of a single child construct (a single grammatical symbol), evaluation of the parent construct consists of evaluation of the child construct. The input to the parent is used as the input to the child. The result of the child is the used as the result of the parent. A cleaned-up version of that would, in one paragraph, cover all the "pass-through" cases like OrExpr ::= AndExpr, AndExpr ::= EqualityExpr, etc.) At least be clear that any construct that combines two others (e.g., PathExpr when it combines filter expression and a relative location path), _must_ say something about how the result of one construct affects or is used in the evaluation of the other construct. (In case you're wondering how I got into looking at this: I was trying to figure out if position() can be used to get the index of a given node within a set of nodes that should contain it (e.g., given a variable holding a single element node of element type "record", get the index or position of that node in a nodeset, e.g., in the set "/record" of all (top-level) record elements in the document). I had to track down the exact semantics of the constructs that set the context position that position() returns, and found that some things aren't really defined.) Daniel -- Daniel Barclay Digital Focus Daniel.Barclay@digitalfocus.com
Received on Friday, 6 April 2001 17:39:41 UTC