RE: [F&O] IBM-FO-001 Request for "atom" function

Don:

Your request was discussed at this morning's joint meeting and did not
attract enough votes to make the change.

The primary arguments were that the function increased complexity and
provided yet another definition of 'value' that 

was insufficiently compelling.  Thank you for your input.

All the best, Ashok 

________________________________

From: public-qt-comments-request@w3.org
[mailto:public-qt-comments-request@w3.org] On Behalf Of Don Chamberlin
Sent: Monday, January 19, 2004 6:27 PM
To: public-qt-comments@w3.org
Cc: Andrew Eisenberg
Subject: [F&O] IBM-FO-001 Request for "atom" function

 


This Last Call comment proposes a new XPath/XQuery function, possibly
named fn:atom(). The result of fn:atom() is the same as that of
fn:data(), except in those cases where fn:data() would construct a typed
value by concatenating the string values of descendant elements. In
these cases, fn:atom() returns an empty sequence. The definition of
fn:atom() is as follows: 

fn:atom($arg as item()*) as xdt:anyAtomicType* 

The result of fn:atom is the sequence of atomic values produced by
applying the following rules to each item in $arg: 

1. If the item is an atomic value, it is returned. 

2. If the item is a document node, the result is (). 

3. If the item is an attribute, text, comment, processing instruction,
or namespace node, the typed value of the node is returned (same as
fn:data). 

4. If the item is an element node (regardless of type annotation) that
has no child element node, the typed value of the node is returned. 

5. If the item is an element node (regardless of type annotation) that
has a child element node, an empty sequence is returned. 

The advantages of the proposed function are as follows: 

(a) Suppose that $node1 and $node2 are element nodes with a type
annotation that permits mixed content. Suppose that $node1 is bound to
<node><a>1</a><b>2</b></node>, and $node2 is bound to
<node><b>1</b><a>2</a></node>.  Then data($node1) = data($node2) is
true, because the typed value returned by the data function ignores
nested markup. This is a undesirable result in certain applications. The
atom() function would enable a user to avoid this anomaly by writing
atom($node1) = atom($node2). In this case both atom() functions would
return () and the result of the predicate would be false. 

(b) Suppose $sal is bound to the following untyped node:
<salary><base>17</base><bonus>25</bonus></salary>.  Consider the
predicate $sal > 300. In processing this query, the data function will
be implicitly invoked and will return the untyped atomic value "1725".
Comparing this value with 300 will implicitly cast it to the integer
1725. The value of the predicate is true. This anomaly could be avoided
by using the atom function, as follows: atom($sal) > 300. This revised
predicate is false. 

(c) The fn:atom function can be used to search for elements that have a
given atomic value without incurring the cost of concatenating
potentially huge volumes of data. For example, consider the query: 
/book[title = "War and Peace"]//*[atom(.) = "Leo Tolstoy"] 
This query can quickly find leaf nodes with the atomic content "Leo
Tolstoy" without computing the string-values of all the intermediate
nodes in the tree, some of which are very large (the topmost element
contains the whole book!) 

We believe that this function is sufficiently important to justify
including it in the standard function library in order to make
applications portable. 

Cheers, 
--Don Chamberlin

Received on Tuesday, 16 March 2004 16:43:57 UTC