Re: XPathResult singleNodeValue from Ray Whitmer on 2002-02-16 (www-dom@w3.org from January to March 2002)

From: Ray Whitmer <rayw@netscape.com>
Date: Sat, 16 Feb 2002 02:39:34 -0800
To: Elliotte Rusty Harold <elharo@metalab.unc.edu>
CC: www-dom@w3.org
Message-ID: <3C6E36E6.7020205@netscape.com>
Elliotte Rusty Harold wrote:

> At 6:13 PM -0800 2/15/02, Ray Whitmer wrote:
>
>
>> All implementations thus far have an easy way of grabbing just a 
>> single result without caring about complete computation of the 
>> result.  This is an extremely practical consideration both for ease 
>> of use and for efficiency of use and implementation.  When the caller 
>> needs just one, he shouldn't have to set up the whole mechanism 
>> required for dealing with multiples.  Ultimately, XPath and DOM are 
>> for the use of the user. It is common to need just one node, which is 
>> what the API makes easily available, but only if that is what was 
>> requested.
>>
>
> That's not quite true, at least not as I read the API. The user may 
> write an XPath expression which returns a node set containing 10 
> nodes. However, the current API presents this just the same as an 
> XPath that returns a node set containing one node. I wouldn't mind a 
> selectSingleNode() function provided it threw an exception if the 
> XPath produced anything more or less than one node.

If a single result is requested, returning a null node seems completely 
adequate when there are no results found.  I disagree that it would be 
worth the extra computation just to be able to throw an exception when 
more than one node is discovered.  If a user wants to know that there 
was exactly one result, it is easy enough to request the snapshot or 
iterator that permits him to check that.  In DOM, exceptions should 
represent programming errors, not excessive search results, although 
there are flaws in certain cases in the existing API.

I said that such extra mechanism is not set up if the user does not 
request it.  You said that is not true, but insisted on usual full 
computation of the expression over the entire document looking for the 
elusive second node the user either already knows is not there or does 
not care to know about.  The current API gives the user the part he 
requested, be that a single node or a complete set.  There is no real 
victory in insisting on computing the entire thing just so the user can 
ignore everything but the first entry of the set.  That would make the 
single node feature worth less, I think, to both the implementor and the 
user, if it were just a wrapper that generally forced the entire 
document to be combed looking for another match and creating artificial 
failure when there is not exactly one match, which the user may already 
know or not need to know, and can easily request if he needs it.  As I 
have said before, the user who wants that, can get it using the iterator 
or snapshot which compute the entire result, but he should know that he 
will pay for the luxury of a complete result.

> The API should be as simple and easy-to-use as possible, but no 
> simpler. There's been a real tendency in XML APIs to hide the 
> complexity of XML by pretending it's simpler than it really is. This 
> isn't a good thing. Fortunately, this is a sin which DOM hasn't 
> committed to date. (In fact, I suspect the opposite is true if anything.)
>
> The fact is XPath 1.0 expressions do not return single nodes. They 
> return node sets.  I see lots of developers making logic errors 
> because they think an XPath expression returns a single node. It might 
> be easier for them if an XPath expression did return a single node, 
> but this isn't what actually happens and it isn't something that can 
> be fixed in DOM. It would have to be fixed in XPath. DOM should not 
> sweep the issue under the rug. It should allow developers to ignore 
> the very real possibility that an expression returns several nodes. In 
> the long run, I suspect we'd all be better off if DOM XPath hewed more 
> closely to what XPath actually is, rather than what developers want it 
> to be.

XPath's over-simplified world does not exist for DOM applications. 
 XPath also does not deal with mutable nodes, but I think we should be 
realists and not restrict it to read-only fully-normalized tree with no 
entity references, etc..  DOM is for developers to apply XML, not just 
for theorists.  The API presents the results of an expression -- as much 
of the result as is requested.  Getting a single node, whatever the 
expression may return, has been counted as a significant use case, and 
one that existing XPath implementations available through DOM APIs in 
Mozilla and IE support.  If the current descriptions are not clear on 
that point that this option does not provide the entire result of an 
XPath expression, that can be fixed as an editorial change, but the 
intent of single-node in this API is for those who want a single node, 
knowing for certain based upon validation or construction techniques 
that there is one, or not caring to process beyond a single one.

Computer application development languages seldom resemble pure set 
theory -- IMO Lisp does not although certain purists think so, but if we 
used a lisp model, DOM would be very different and probably not in 
browsers.  Sets and numbers would never be mutable, but variables and 
documents are and it goes much further.  I do not think we are going too 
far giving directly giving developers (and naming it well) what some use 
cases call for, just because the XPath model does not address such 
issues.  Allowing easy access to a single result is purely a practical 
consideration.  Restricting results to only completely-computed result 
sets so that the user can ignore all but the first does not seem like a 
real win for anyone.  The user could likewise select a single element of 
the result in the XPath expression if that is what his application calls 
for.  The API just makes it more convenient and economical to use XPath 
for this common use case, which shouldn't be a crime, since the 
developer decides which he wants when he formulates the call, just as 
surely as if he hard-coded "resultSet[1]" into his application.  You 
cannot force the user to consider more than one result significant when 
it is not, whatever the entire result set may contain.  That is why the 
feature clearly states "FIRST_ORDERED_NODE" or "ANY_UNORDERED_NODE" 
which to me clearly says just by looking at the name that  a partial 
result is being requested, and there is a description as well.  It is 
not called "ONLY_NODE".  

Ray Whitmer
rayw@netscape.com
Received on Saturday, 16 February 2002 05:40:10 UTC