- From: Ashok Malhotra <petsa@us.ibm.com>
- Date: Fri, 29 Dec 2000 08:29:52 -0500
- To: www-xml-schema-comments@w3.org
- Cc: XMLStds@us.ibm.com
I am forwarding extracts from a note written by Andy Clark, one of our developers, that argues that the full power of XPath cannot be supported by streaming Schema processors. He also argues that the full power of XPath is overkill for identity constraints and proposes a subset. Although Andy wrote the note, this is not one man's opinion. A number of us in IBM feel that allowing the full power of XPath in identity constraints is counterproductive. - It makes schema processors unnecessarily complex. - Cannot be supported by SAX based processors - Most important, the full power of XPath is not necessary for identifying keys. Only a subset is useful. All the best, Ashok ---------------------- Forwarded by Ashok Malhotra/Watson/IBM on 12/29/2000 08:18 AM --------------------------- Andy Clark 12/29/2000 02:19 AM To: XMLStds cc: From: Andy Clark/Santa Teresa/IBM@IBMUS Subject: SPath and Identity Constraints Subsetting XPath for a Streaming Parser A streaming (or serial) parser does not buffer any data; the document information is communicated via document handler callbacks such as "startElement", "characters", or "endElement" and is then discarded. In this model, an XPath Selector only has access to the element children, the element's attributes, and character data. The XPath Selector cannot traverse the various document axes unless the matcher buffers the document -- the very situation we are trying to avoid. Therefore it is necessary to describe a subset of XPath that can be used in a serial manner. We shall call this subset Serial XPath (SPath). In short, SPath disallows all axes, predicates, and functions that can not be supported without buffering a portion of the document. Therefore, the only axes supported by SPath are attribute, child, and self. Also, the use of path expressions in predicates is limited to a single step on the attribute axis. Functions are permitted but restricted. Some functions, like "id()", can not be supported in SPath. Some examples of illegal SPaths (but legal XPaths): .. a//b a[b] a[position()=last()] Some examples of legal SPaths: . /a a/b @a a[@b] a[position()=2] Productions for SPath and the allowed function set are included at the end of this document. XML Schema Identity Constraints XML Schema Identity constraints provide an identity declaration and referencing mechanism that is more powerful than the ID/IDREF attribute types provided in DTDs. The identity constraints rely on XPath for describing the location of the data values in the instance document that should be considered uniquely or as keys and key references. Identity constraints are associated with an element declared in the grammar and are comprised of a "selector" and one or more "fields". Selectors are relative paths from the declared element and must match a node-set of elements. Fields are relative paths from the elements matched by the selector and must match an attribute or an element of simple type. Uniqueness is determined by comparing the data values matched by the fields in the value space. Therefore, "1.0" and "1" are equal for decimal data values. An abbreviated example from the XML Schema Primer (CR): <element name='purchaseReport'> <key name='pNumKey'> <selector xpath='parts/part'/> <field xpath='@number'/> </key> </element> In order to provide identity constraint support in an efficient manner, SPath will be used instead of the full XPath. However, SPath requires further restrictions to be suitable for use in XML Schema identity constraints because of the selection requirements of selectors and fields. We shall call this subset Restricted-SPath. As previously stated, the selector SPath must select elements and the field SPath must select attributes or elements of simple type. In addition, because selectors and fields are relative paths, no SPath used for either a selector or a field may select the root. Productions for Restricted-SPath and the allowed function set are included at the end of this document. Performance Benefits and Considerations Only supporting a subset of XPath for XML Schema identity constraints greatly enhances the performance of the parser. Not buffering document information reduces the amount of memory required and allows identity constraints to be supported by streaming parsers implementing interfaces such as XNI and SAX. Also, the limited number of allowed axes in SPath reduces the computation time for selecting identity constraint values. So an implementation of identity constraints using SPath can be very efficient in both memory usage and computation time. However, even using SPath for identity constraints degrades parser performance. The cost is directly related to the number of identity constraint selectors and fields that are active while parsing the document and the depth of the element context subtree. This cost comes mainly from the method invocation overhead for notifying selector and field of document information. SPath Productions The following list of productions are modifications to those productions defined in the XPath Specification. Modified productions are denoted with a prime (') and deleted productions are marked with an 'x'. I dislike the productions in this grammar, however, because the grammar allows nonsensical constructions such as "@a/b" and "/..", although the latter is hard to catch anyway. While this is a full set of productions of everything that can be supported by a Serial XPath implementation, there are additional constraints on the grammar. For example, only the attribute and self axes are allowed within predicate expressions. However, these restrictions are not introduced into this modified grammar. [1] LocationPath ::= RelativeLocationPath | AbsoluteLocationPath [2'] AbsoluteLocationPath ::= '/' RelativeLocationPath? [3'] RelativeLocationPath ::= Step | RelativeLocationPath '/' Step [4] Step ::= AxisSpecifier NodeTest Predicate* | AbbreviatedStep [5] AxisSpecifier ::= AxisName '::' | AbbreviatedAxisSpecifier [6'] AxisName ::= 'attribute' | 'child' | 'self' [7] NodeTest ::= NameTest | NodeType '(' ')' | 'processing-instruction' '(' Literal ')' [8] Predicate ::= '[' PredicateExpr ']' [9] PredicateExpr ::= Expr [10x] [11x] [12'] AbbreviatedStep ::= '.' [13] AbbreviatedAxisSpecifier ::= '@'? [14] Expr ::= OrExpr [15'] PrimaryExpr ::= '(' Expr ')' | Literal | Number | FunctionCall [16] FunctionCall ::= FunctionName '(' ( Argument ( ',' Argument ) * )? ')' [17] Argument ::= Expr [18] UnionExpr ::= PathExpr | UnionExpr '|' PathExpr [19'] PathExpr ::= LocationPath | FilterExpr | FilterExpr '/' RelativeLocationPath [20] FilterExpr ::= PrimaryExpr | FilterExpr Predicate [21] OrExpr ::= AndExpr | OrExpr 'or' AndExpr [22] AndExpr ::= EqualityExpr | AndExpr 'and' EqualityExpr [23] EqualityExpr ::= RelationalExpr | EqualityExpr '=' RelationalExpr | EqualityExpr '!=' RelationalExpr [24] RelationalExpr ::= AdditiveExpr | RelationalExpr '<' AdditiveExpr | RelationalExpr '>' AdditiveExpr | RelationalExpr ' <=' AdditiveExpr | RelationalExpr '>=' AdditiveExpr [25] AdditiveExpr ::= MultiplicativeExpr | AdditiveExpr '+' MultiplicativeExpr | AdditiveExpr '-' MultiplicativeExpr [26] MultiplicativeExpr ::= UnaryExpr | MultiplicativeExpr MultiplyOperator UnaryExpr | MultiplicativeExpr 'div' UnaryExpr | MultiplicativeExpr 'mod' UnaryExpr [27] UnaryExpr ::= UnionExpr | '-' UnaryExpr [28'] ExprToken ::= '(' | ')' | '[' | ']' | '.' | '@' | ',' | '::' | NameTest | NodeType | Operator | FunctionName | AxisName | Literal | Number [29] Literal ::= '"' [^"]* '"' | "'" [^']* "'" [30] Number ::= Digits ('.' Digits?)? | '.' Digits [31] Digits ::= [0-9]+ [32'] Operator ::= OperatorName | MultiplyOperator | '/' | '|' | '+' | '-' | '=' | '!=' | '<' | '<=' | '>' | '>=' [33] OperatorName ::= 'and' | 'or' | 'mod' | 'div' [34] MultiplyOperator ::= '*' [35] FunctionName ::= QName - NodeType [36x] [37] NameTest ::= '*' | NCName ':' '*' | QName [38] NodeType ::= 'comment' | 'text' | 'processing-instruction' | 'node' [39] ExprWhitespace ::= S Restricted-SPath Productions Since the rules governing what identity constraint limit what selectors and field expressions are allowed to select and for simplicity, I have rewritten the productions suitable for Restricted-SPath. I have not made an effort to keep the names of the productions in line with the standard XPath grammar definition. These productions are even further simplified from what is possible to implement for the reason of further simplicity and ease of understanding. [1] Selector ::= SelectorPath | SelectorUnion [2] Field ::= FieldPath | FieldUnion [3] SelectorPath ::= ElementPath [4] SelectorUnion ::= SelectorPath '|' SelectorPath [5] FieldPath ::= ElementPath | AttributePath [6] FieldUnion ::= FieldPath '|' FieldPath [7] ElementPath ::= ElementFinalStep | Path '/' ElementFinalStep [8] ElementUnion ::= ElementPath '|' ElementPath [9] ElementStep ::= ( ElementOrSelfAxis Name | ContextNode ) Predicate* [10] ElementFinalStep ::= ( ElementOrSelfAxis QName | ContextNode ) Predicate* [11] ChildOrSelfAxis ::= ChildAxis? | SelfAxis [12] ChildAxis ::= 'child' '::' [13] SelfAxis ::= 'self' '::' [14] ContextNode ::= '.' [15] AttributePath ::= AttributeFinalStep | Path '/' AttributeFinalStep [16] AttributeUnion ::= AttributePath '|' AttributePath [17] AttributeStep ::= AttributeAxis Name [18] AttributeFinalStep ::= AttributeAxis QName [19] AttributeAxis ::= 'attribute' '::' | '@' [20] Path ::= ElementStep Predicate* | Path '/' ElementStep [21] Name ::= '*' | NCName ':' '*' | QName [22] Predicate ::= '[' PredicateExpr ']' [23] PredicateExpr ::= PositionExpr | LiteralExpr [24] PositionExpr ::= Number ( '=' 'position()' )? | 'position()' '=' Number [25] LiteralExpr ::= AttributeStep ( '=' Literal )? | Literal '=' AttributeStep [26] Number ::= Digit+ [27] Literal ::= '"' [^"]* '"' | "'" [^']* "'"
Received on Friday, 29 December 2000 08:30:08 UTC