- From: Mike Schilling <mschilling@edgility.com>
- Date: Fri, 15 Mar 2002 16:50:42 -0500 (EST)
- To: www-xpath-comments@w3.org
Encoded SOAP documents (that is, XML documents that are generated by
applying SOAP's encoding rules as described in
http://www.w3.org/TR/soap12-part2/#soapenc) are a fast-growing segment
of XML . Unfortunately, XPath is not well-equipped to process them.
To see why, let's look at a simple example:
<?xml version='1.0' encoding='UTF-8'?>
<soap:Envelope xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xmlns:xsd='http://www.w3.org/2001/XMLSchema'
xmlns:soap='http://schemas.xmlsoap.org/soap/envelope/'
xmlns:soapenc='http://schemas.xmlsoap.org/soap/encoding/'
soap:encodingStyle='http://schemas.xmlsoap.org/soap/encoding/'>
<soap:Body>
<n:getOrderStatusResponse
xmlns:n='http://tempuri.org/OrderStatusTable'>
<Result href='#id0'/>
</n:getOrderStatusResponse>
<id0 id='id0' soapenc:root='0'
xmlns:ns2='http://www.user.com/package/' xsi:type='ns2:Status'>
<orderNum xsi:type='xsd:string'>10002</orderNum>
<accountName xsi:type='xsd:string'>IBM</accountName>
<orderState xsi:type='xsd:string'>Open</orderState>
<orderPriority xsi:type='xsd:string'>High</orderPriority>
<orderManager xsi:type='xsd:string'>Barry Bonds</orderManager>
<shipToInfo href='#id1'/>
</id0>
<id1 id='id1' soapenc:root='0'
xmlns:ns2='http://www.user.com/package/' xsi:type='ns2:ShipTo'>
<contact xsi:type='xsd:string'>Jeff Kent</contact>
<address1 xsi:type='xsd:string'>123 Main Street</address1>
<address2 xsi:type='xsd:string'>Suite 400</address2>
<city xsi:type='xsd:string'>Oakland</city>
<state xsi:type='xsd:string'>CA</state>
<country xsi:type='xsd:string'>USA</country>
<zip xsi:type='xsd:int'>94612</zip>
</id1>
</soap:Body>
</soap:Envelope>
Note that the document's structure and the structure of the underlying data
are quite different. From the data point of view, we have the hierarchy:
getOrderStatusResponse
Result
shipToInfo
In XML, the corresponding elements are peers. The relationships are
expressed by href attributes which link to id attributes. (Since the data
can form an arbitrary graph rather than a strict tree, it's necessary for
SOAP to be able to use links in addition to containment, of course.)
The dereferences introduced in XPath 2.0 cannot be used here for three
reasons, of increasing complexity.
1. href is not strictly speaking an ID
An href attribute that points to an element with id "a" has value
"#a". This is a minor point, and requires only a minor adjustment.
2. The element names in these documents are meaningless.
Dereferences use NameTests, just as axes do, which makes sense when element
names are meaningful. In these documents, an expression like:
n:getOrderStatusResponse/@Result #> id0
is useless. It works on this particular document, but need not work on
another document containing the same information created by a different
implementation of the SOAP encoder. A slight change to the data (e.g.
adding a billTo address) could result in different element name even with
the same encoder. What's needed is a syntax which doesn't refer to the
element name of the target.
3. The structure of the document isn't entirely fixed.
The SOAP spec is a bit slippery here, but I think an encoder is within its
rights to notice that the shipTo element only appears once in the document,
and inline it, producing:
<id0 id='id0' '>
....
<shipToInfo xsi:type='ns2:ShipTo'>
...
</shipToInfo>
</id0>
Now what's needed is a construct which finds a child element and:
If it's a real element, returns it.
If it's a stub (has an href attribute), traverses its href link and
returns the result.
The obvious way to extend XPath to handle this is to introduce a
special-purpose function. I actually did this (starting with standard
Xalan 2.2), calling it "href". This works reasonably well when only one
link needs traversing:
href(n:getOrderStatusResponse, "shipToInfo")
But it gets ugly fast as the number of steps increases:
href( href( href( n:getOrder, "n:order" ), Items)[1], shipToInfo)
This might be acceptable if all expressions are computer-generated, but
definitely not otherwise. XPath, rightly, expresses this kind of iteration
with steps. Accordingly, I added a new axis called "encoded-ref":
n:getOrder/encoded-ref::Items[1]/encoded-ref::shipToInfo
and, since this became one of the most frequently used axes, added the
abbreviation "^" :
n:getOrder/^Items[1]/^shipToInfo
This has found fairly good user acceptance (even though so far as I know
none of the users are former Pascal programmers who would find ^ as a
dereference operator mnemonic:-) There were a few implementation
difficulties, because the name following
encoded-ref::
is *not* a NameTest. Xalan's implementation of axes is:
The axis is represented by an iterator that produces all nodes one step
along that axis
from the starting point.
The filter (NameTest or NodeTest) that follows filters the product of the
iterator
The predicates for the step filter the output further.
(I don't know how common this is in XPath implementations, but the XPath
grammar is quite suited to it.) This doesn't work here, since the name
expresses how to navigate to the target nodeset, not any property of the
target nodes themselves. In the example above, one would get to the
response (the id0 element) with
n:getOrderStatusResponse/^Result
but there is nothing about the id0 element itself which matches
"Result". Instead, the string "Result" has to be processed by the axis
iterator itself.
To sum up:
XPath 2.0 as currently defined cannot process encoded SOAP documents. It's
possible to add a new type of axis to remedy this. The implementation
costs are non-zero but not prohibitive. The definition of this axis is
straightforward, result in something subtly different from the other axes.
Received on Saturday, 16 March 2002 07:35:36 UTC