[F&O] IBM-FO-001 Request for "atom" function

This Last Call comment proposes a new XPath/XQuery function, possibly 
named fn:atom(). The result of fn:atom() is the same as that of fn:data(), 
except in those cases where fn:data() would construct a typed value by 
concatenating the string values of descendant elements. In these cases, 
fn:atom() returns an empty sequence. The definition of fn:atom() is as 
follows:

fn:atom($arg as item()*) as xdt:anyAtomicType*

The result of fn:atom is the sequence of atomic values produced by 
applying the following rules to each item in $arg:

1. If the item is an atomic value, it is returned.

2. If the item is a document node, the result is ().

3. If the item is an attribute, text, comment, processing instruction, or 
namespace node, the typed value of the node is returned (same as fn:data). 


4. If the item is an element node (regardless of type annotation) that has 
no child element node, the typed value of the node is returned.

5. If the item is an element node (regardless of type annotation) that has 
a child element node, an empty sequence is returned.

The advantages of the proposed function are as follows:

(a) Suppose that $node1 and $node2 are element nodes with a type 
annotation that permits mixed content. Suppose that $node1 is bound to 
<node><a>1</a><b>2</b></node>, and $node2 is bound to 
<node><b>1</b><a>2</a></node>.  Then data($node1) = data($node2) is true, 
because the typed value returned by the data function ignores nested 
markup. This is a undesirable result in certain applications. The atom() 
function would enable a user to avoid this anomaly by writing atom($node1) 
= atom($node2). In this case both atom() functions would return () and the 
result of the predicate would be false.

(b) Suppose $sal is bound to the following untyped node: 
<salary><base>17</base><bonus>25</bonus></salary>.  Consider the predicate 
$sal > 300. In processing this query, the data function will be implicitly 
invoked and will return the untyped atomic value "1725". Comparing this 
value with 300 will implicitly cast it to the integer 1725. The value of 
the predicate is true. This anomaly could be avoided by using the atom 
function, as follows: atom($sal) > 300. This revised predicate is false.

(c) The fn:atom function can be used to search for elements that have a 
given atomic value without incurring the cost of concatenating potentially 
huge volumes of data. For example, consider the query:
/book[title = "War and Peace"]//*[atom(.) = "Leo Tolstoy"]
This query can quickly find leaf nodes with the atomic content "Leo 
Tolstoy" without computing the string-values of all the intermediate nodes 
in the tree, some of which are very large (the topmost element contains 
the whole book!)

We believe that this function is sufficiently important to justify 
including it in the standard function library in order to make 
applications portable.

Cheers,
--Don Chamberlin

Received on Monday, 19 January 2004 21:27:30 UTC