- From: Mike Schilling <mschilling@edgility.com>
- Date: Fri, 15 Mar 2002 16:50:42 -0500 (EST)
- To: www-xpath-comments@w3.org
Encoded SOAP documents (that is, XML documents that are generated by applying SOAP's encoding rules as described in http://www.w3.org/TR/soap12-part2/#soapenc) are a fast-growing segment of XML . Unfortunately, XPath is not well-equipped to process them. To see why, let's look at a simple example: <?xml version='1.0' encoding='UTF-8'?> <soap:Envelope xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:soap='http://schemas.xmlsoap.org/soap/envelope/' xmlns:soapenc='http://schemas.xmlsoap.org/soap/encoding/' soap:encodingStyle='http://schemas.xmlsoap.org/soap/encoding/'> <soap:Body> <n:getOrderStatusResponse xmlns:n='http://tempuri.org/OrderStatusTable'> <Result href='#id0'/> </n:getOrderStatusResponse> <id0 id='id0' soapenc:root='0' xmlns:ns2='http://www.user.com/package/' xsi:type='ns2:Status'> <orderNum xsi:type='xsd:string'>10002</orderNum> <accountName xsi:type='xsd:string'>IBM</accountName> <orderState xsi:type='xsd:string'>Open</orderState> <orderPriority xsi:type='xsd:string'>High</orderPriority> <orderManager xsi:type='xsd:string'>Barry Bonds</orderManager> <shipToInfo href='#id1'/> </id0> <id1 id='id1' soapenc:root='0' xmlns:ns2='http://www.user.com/package/' xsi:type='ns2:ShipTo'> <contact xsi:type='xsd:string'>Jeff Kent</contact> <address1 xsi:type='xsd:string'>123 Main Street</address1> <address2 xsi:type='xsd:string'>Suite 400</address2> <city xsi:type='xsd:string'>Oakland</city> <state xsi:type='xsd:string'>CA</state> <country xsi:type='xsd:string'>USA</country> <zip xsi:type='xsd:int'>94612</zip> </id1> </soap:Body> </soap:Envelope> Note that the document's structure and the structure of the underlying data are quite different. From the data point of view, we have the hierarchy: getOrderStatusResponse Result shipToInfo In XML, the corresponding elements are peers. The relationships are expressed by href attributes which link to id attributes. (Since the data can form an arbitrary graph rather than a strict tree, it's necessary for SOAP to be able to use links in addition to containment, of course.) The dereferences introduced in XPath 2.0 cannot be used here for three reasons, of increasing complexity. 1. href is not strictly speaking an ID An href attribute that points to an element with id "a" has value "#a". This is a minor point, and requires only a minor adjustment. 2. The element names in these documents are meaningless. Dereferences use NameTests, just as axes do, which makes sense when element names are meaningful. In these documents, an expression like: n:getOrderStatusResponse/@Result #> id0 is useless. It works on this particular document, but need not work on another document containing the same information created by a different implementation of the SOAP encoder. A slight change to the data (e.g. adding a billTo address) could result in different element name even with the same encoder. What's needed is a syntax which doesn't refer to the element name of the target. 3. The structure of the document isn't entirely fixed. The SOAP spec is a bit slippery here, but I think an encoder is within its rights to notice that the shipTo element only appears once in the document, and inline it, producing: <id0 id='id0' '> .... <shipToInfo xsi:type='ns2:ShipTo'> ... </shipToInfo> </id0> Now what's needed is a construct which finds a child element and: If it's a real element, returns it. If it's a stub (has an href attribute), traverses its href link and returns the result. The obvious way to extend XPath to handle this is to introduce a special-purpose function. I actually did this (starting with standard Xalan 2.2), calling it "href". This works reasonably well when only one link needs traversing: href(n:getOrderStatusResponse, "shipToInfo") But it gets ugly fast as the number of steps increases: href( href( href( n:getOrder, "n:order" ), Items)[1], shipToInfo) This might be acceptable if all expressions are computer-generated, but definitely not otherwise. XPath, rightly, expresses this kind of iteration with steps. Accordingly, I added a new axis called "encoded-ref": n:getOrder/encoded-ref::Items[1]/encoded-ref::shipToInfo and, since this became one of the most frequently used axes, added the abbreviation "^" : n:getOrder/^Items[1]/^shipToInfo This has found fairly good user acceptance (even though so far as I know none of the users are former Pascal programmers who would find ^ as a dereference operator mnemonic:-) There were a few implementation difficulties, because the name following encoded-ref:: is *not* a NameTest. Xalan's implementation of axes is: The axis is represented by an iterator that produces all nodes one step along that axis from the starting point. The filter (NameTest or NodeTest) that follows filters the product of the iterator The predicates for the step filter the output further. (I don't know how common this is in XPath implementations, but the XPath grammar is quite suited to it.) This doesn't work here, since the name expresses how to navigate to the target nodeset, not any property of the target nodes themselves. In the example above, one would get to the response (the id0 element) with n:getOrderStatusResponse/^Result but there is nothing about the id0 element itself which matches "Result". Instead, the string "Result" has to be processed by the axis iterator itself. To sum up: XPath 2.0 as currently defined cannot process encoded SOAP documents. It's possible to add a new type of axis to remedy this. The implementation costs are non-zero but not prohibitive. The definition of this axis is straightforward, result in something subtly different from the other axes.
Received on Saturday, 16 March 2002 07:35:36 UTC