Re: Behaviour of fn:string on sequences

Thank you very much for the time you are taking to explain this to me, I 
do appreciate it but am finding it quite hard to get my head around this 
subject.

Kay, Michael wrote:
>  > Is there an
>  > overview document that explains what all the different specifications
>  > are for and is a good place to start learning about it. At
>  > the moment I
>  > am blindly jumping between them looking for the bits and
>  > pieces that I need.
> 
> The XPath language spec is the best place to start. Read that until you 
> find something you don't understand, then look in the other documents to 
> see if they help.
> 

That is what I started to do, however in many places, such as fn:string 
function it is not clear where the missing (how it handles sequences) 
information is. I know that it was not written to be dipped into, 
however it would be nice if the functions paper had a few more pointers 
to the other papers. I tried to track it down by looking at the 
Notations section to see what item? meant, however while that section 
does have a reference to the Formal Semantics paper that is to do with 
return types, not how function calls are handled so I missed it.

>  >
>  > So let me get this straight (ignoring XPath 1.0 compatability mode).
>  >
>  >   o  An empty sequence results in an empty string.
>  >   o  A sequence with one item is the same as passing in a single item.
>  >   o  A sequence with more than one item is an error.
>  >
>  > That seems very inconsistent to me. Why does fn:string not take a
>  > sequence ? It could return a string consisting of the
>  > concatenation of
>  > the results of applying fn:string to each item in the sequence with
>  > possibly a space separator between them, although the
>  > separator could be
>  > supplied as an optional second parameter.
> 
> That specification might feel like a good one to you, but it would be 
> completely incompatible with XPath 1.0. The fact that compatibility mode 
> is off doesn't mean we can throw backwards compatibility out of the 
> window: we have to make it feasible for people to move forwards.
> 

Why is my suggestion any worse than the existing behaviour in non 
compatability mode ? All that I am doing is changing an operation that 
would throw an exception into one that does something reasonably useful.

>  >
>  > How do I get the string representation of a sequence ?
>  > Is there another function to do this ?
> 
> Yes, the string-join function is there for this purpose.

That is exactly what I want.

>  >
>  > On a similar note, the specification for fn:boolean says ".... If
>  > $srcval is an atomic value ....". However, fn:boolean takes a
>  > sequence
>  > so I presume that $srcval is an 'atomic value' if the
>  > sequence contains
>  > a single item. However, this behaviour does not appear to
>  > have any real
>  > meaning when applied to a sequence.
> 
> I'm not sure what you mean by "real meaning".
> 

Useful.

Take its use within XSLT, I can do the following to check that the node 
set referred to by $var is empty.
     <xsl:if test="$var">....</xsl:if>

If however I wanted to write a template that could handle sequences and 
nodes then I could not use the above pattern as if I am given a sequence 
of plain types then the above will either return true if it has more 
than one item, return true or false if it is one item that can be 
converted to a boolean, throw an exception if it has one item that can 
not be converted to a boolean, or return false if it is empty.

That behaviour will of course not break any existing XSLTs (assuming 
that other functions or operations have not been changed to return 
sequences of atomic values as opposed to sequences of nodes). However, 
the fact that the above pattern which is a very commonly used one has 
this strange behaviour will cause problems.

Of course, as you explained there is an empty() function that should be 
used instead but people will still use the above pattern because it is 
easier and works in most cases. As they move to take advantage of 
sequences in XPath 2.0 they will get bitten by this problem.

Compatability is more than just the way that functions work, it also 
involves the patterns and idioms as well. The fact that empty() does not 
exist in XPath 1.0 forces me to use the above pattern

>  > e.g. I often want to be able to tell whether a sequence is empty. In
>  > XPath 1.0 if I gave it a node set then it would return false
>  > if it was
>  > empty and true otherwise. This is also how it behaves in XPath 2.0.
>  > However, if I give it a sequence of atomic types (as opposed to a
>  > sequence of nodes) then it does not tell me whether the sequence is
>  > empty as a non empty sequence containing a single atomic value could
>  > evaluate to false.
> 
> Correct. In XPath 2.0 the values that return false when supplied as 
> arguments to the boolean function are exactly the same as the values 
> that returned false in XPath 1.0: an empty node-set, the boolean false, 
> a numeric zero, and the string "".
> 
> If you want to test whether a sequence is empty, don't use the boolean() 
> function, use the empty() function.

But a node-set is just a sequence in XPath 2.0 and fn:boolean works for 
that.

>  >
>  > I understand that this is a consequence of treating a sequence
>  > containing a single item and that item the same. However, is
>  > that really
>  > a good idea given the strange behaviour that it causes in relatively
>  > simple functions such as these two.
> 
> I think the strangeness in these two functions is because they were both 
> carried forward from XPath 1.0 which had a different data model. This 

Actually I think that any function that can only take a single item will 
  have similar strange behaviour.

This would not be a problem if I was using that function explicitly but 
I do not have any real control over whether it is used or not.


> doesn't make the data model wrong. The reason that a single item is the 
> same as a sequence of length one in the XPath model is because that's 
> the way it is in XML Schema.
> 

I don't understand what you mean in the last sentence about XML Schema, 
could you explain that a bit more as I would like to understand the 
benefits of doing this as at the moment I can only see disadvantages. To 
me the sequence in XML Schema and sequence in XPath are not the same 
although they are similar. I could use XPath to retrieve the child nodes 
of an element as a sequence of nodes irrespective of whether the XML 
Schema for that document defines the content of that element as a 
sequence, choice, or combination of both. Just because a node could be 
treated as a sequence of one node does not mean that it is.

The definition of a sequence says that it cannot contain other sequences 
and yet as it contains individual items and those items are identical to 
a sequence containing just that one item then in effect a sequence can 
contain sequences. In fact to paraphrase it is sequences all the way down.

I think that it would be much more logical if sequences were distinct 
from singleton items. I don't know what all the consequences would be of 
doing this are. You would of course need a way (function) to extract 
items from sequences and create a sequence from a single item but I am 
sure that there would be more. This could also allow better static type 
checking as it would not be possible to pass a sequence to a function 
that can only take an item. At the moment that check has to be done at 
runtime.

Operators and functions that work on sequences and want to work on items 
could be easily overloaded to work on a single item (convert that item 
into a sequence and then call the sequence version.)

The effect on fn:boolean would be it could distinguish between a 
sequence and a single item and return true if the sequence was not empty 
irrespective of what was inside it and false if it was not. If it was an 
item then it could cast it to an xs:boolean.

As far as I can see this model will not introduce any backwards 
compatability issues of itself as long as the appropriate functions and 
operators were updated

Received on Friday, 24 October 2003 05:44:33 UTC