[xsl] Re: XPath incompatibilities from Jeni Tennison on 2001-12-23 (www-xpath-comments@w3.org from October to December 2001)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sun, 23 Dec 2001 08:20:08 +0000
To: xsl-list@lists.mulberrytech.com
CC: www-xpath-comments@w3.org
Message-ID: <67148112494.20011223082008@jenitennison.com>
Miloslav Nic wrote:
> ========================================
> 2. The rules for converting a string to a boolean have changed, so
> they are now aligned with XML Schema. In XPath 2.0, the strings "0"
> and "false" are treated as false, while in XPath 1.0, they were
> treated as true. All other strings are converted in the same way as
> XPath 1.0.
> ----------------------------------------
>
> I would prefer the 1.0 way. It is a rule which can become dangerous
> doing text processing.

I agree that it's dangerous when doing text processing, certainly if
test whether there's any remaining string in the normal way
(test="$string"). If this change were made, we would need to change
all those recursive string manipulation algorithms so that they used
test="string-length($string)" instead, or something similar.

> ========================================
> 3. Additional numeric types have been introduced, with the effect
> that arithmetic may now be done as an integer, decimal, or
> single-precision floating point calculation where previously it was
> always performed as double-precision floating point. The most
> notable difference (subject to resolution of an open issue) is that
> 10 div 4 is now an integer division, with the result 2, rather than
> a floating point division yielding 2.5.
> ----------------------------------------
>
> I would prefer very strongly 1.0. Integer division as the default
> behavior is unnatural.

I know that this used to be a problem with the F&O WD, but the new
version of the F&O WD seems to have removed integer as a basic numeric
type. According to the F&O WD, there are only three basic numeric
types (decimal, float and double) and the operators are defined in
terms of those basic numeric types. In fact there is an example that
shows what happens with integers explicitly:

  As another example, a user may define height as a derived type of
  integer with a minimum value of 20 and a maximum value of 100. He
  may then derive oddHeight using a pattern to restrict the value to
  odd integers.

  op:operation(oddHeight, 2) => op:operation(decimal, decimal)

  oddHeight is first promoted to it's base type height. height is
  promoted to its base type integer and integer to it's base type
  decimal.

So I think that 10 div 4 is translated into:

  op:numeric-divide(10, 4) =>
    op:numeric-divide(10.0, 4.0) =>
      2.5 (decimal)

And similarly 2 + 2 = 4.0.
      
Unless the XPath WD is overriding this behaviour? (This is the problem
with having so many specs defining one thing - very easy to
inconsistent definitions across them.)

> ========================================
> 7. Many operations in XPath 2.0 produce an empty sequence as their
> result when one of the arguments or operands is an empty sequence.
> With XPath 1.0, the result of such an operation was typically an
> empty string or the numeric value NaN. Examples include the numeric
> operators, and functions such as substring and name. Functions also
> produce an empty sequence when applied to an argument for which no
> other value is defined; for example, applying the name function to a
> text node now produces the empty sequence. This means, for example,
> that with XPath 1.0 the expression node()[name()!='item'] would
> return all the children of an element, except for elements named
> item; with XPath 2.0 it will exclude text and comment nodes, because
> the condition ()!='item' is treated as false.
> ----------------------------------------
>
> WHY !???????????????????????!!!!!!!!!!!!!!!!!!!!!!!!!
>
> I WANT 1.0 !!!!!!!!!!!!!!!!!

Given that the name() function returns a QName now (rather than a
string), this kind of makes sense, because I don't think there's such
a thing as an 'empty QName'. The same applies to numeric operators in
some circumstances, since NaN is not a valid decimal value, and to
date/times.

So the empty sequence is really acting as a universal null value.

> ========================================
> 10. In XPath 1.0, the < and > operators, when applied to two
> strings, attempted to convert both the strings to numbers and then
> made a numeric comparison between the results. In XPath 2.0, subject
> to resolution of an open issue, it is proposed that these operators
> should perform a lexicographic comparison using the default
> collating sequence.
> ----------------------------------------
>
> I would prefer 1.0 and to have another operator for a lexicographic
> comparison.

I think that I agree. I'm having trouble working out what happens if,
in a schema-less document, you have something like:

  <foo min="3" max="15" />

and want to check:

  @min < @max

I think that the values of the two nodes are converted to strings
(although it's not exactly clear, given that the typed value of an
attribute in a schema-less document is an empty sequence, and Section
2.6.1 of the XPath 2.0 WD (point 3) states that if either operand is a
node, its typed value is taken, and if it's an empty sequence then the
comparison returns an empty sequence; but then point 4 says that if
they're untyped simple values then they're converted to strings, so I
guess that's the case...)

Anyway, if they *are* converted to strings, then obviously
lexicographic comparison would give the result false, since '15' is
lexicographically less than '3'.

I think that there are lots and lots and lots of stylesheets that use
the implicit conversion to numbers when doing comparisons, and that it
would be very confusing to have lexicographic comparison.

I also think that it doesn't make sense for the greater-than and
less-than operators to do lexicographic comparison if the other
comparisons (= and !=) don't do lexicographic comparison (which they
shouldn't).

> ========================================
> 11. In XPath 1.0, functions and operators that compared strings (for
> example, the = operator and the contains function) worked on the
> basis of character-by-character equality of Unicode codepoints,
> allowing Unicode normalization at the discretion of the implementor.
> In XPath 2.0 (subject to resolution of open issues), these
> comparisons are done using the default collating sequence. The
> working group may define mechanisms allowing codepoint comparison to
> be selected as the default collating sequence, but there is no such
> mechanism in the current draft.
> ----------------------------------------

I think it would be interesting to have XSLT set up collations in a
similar way to the way it sets up decimal formats - any number of
named top-level collations, with the default collation being unnamed.
I'm not familiar with everything that a collation needs to do, but it
might be that some of the tailoring could be controlled by attributes
(e.g. normalization). This could be backed up with a reference to or
the internal definition of a user-defined or implementation-defined
extension function for comparing two strings, a bit like the method
that Dimitre uses with his generic sort templates.

> ========================================
> 12. If an arithmetic operator is applied to an operand that is a
> sequence of two or more nodes, at XPath 1.0 the numeric value of the
> first node in the sequence was used. At XPath 2.0, this is an error.
> (The current XPath 2.0 specification does not invoke fallback
> conversion in this case).
> ----------------------------------------
> WHY!!!!!!!!!!!!!!!!??????????????????????????????????/
>
> Default conversions are very valuable for prototyping.

I do think that this is going to cause problems. We XSLT authors often
take advantage of the default conversions - it's rare that we'll add a
[1] at the end of a path to ensure that only one node is selected and
used.

I could understand the way that XPath 2.0 is much more insistent on
single nodes being used for everything if the operators did mapping
over the sequences, such that:

  (1, 2) + (3, 4) => (4, 6)  or
  (1, 2) + 3 => (4, 5)

But given that this isn't (or doesn't seem to be - I haven't had the
chance to digest all of the WDs thoroughly) the case, I don't
understand why there's a requirement to break backwards compatibility.
  
> ========================================
> 13. In the XPath 1.0 data model, an element node had a namespace
> node for each in-scope namespace. The parent of the namespace node
> was the element node, and the namespace nodes for one element were
> distinct from those of any other element (as revealed, for example,
> using the union operator |). In XPath 2.0 (subject to resolution of
> open issues) element nodes will still have namespace nodes for all
> the in-scope namespaces, but these namespace nodes will be shared by
> different elements in the same document: that is, there will be a
> many-to-many relationship between element nodes and namespace nodes.
> This will affect any code that attempts to find the parent or
> ancestors of a namespace node, or that tries to count namespace
> nodes or to form a union between two sets of namespace nodes.
> ----------------------------------------
>
> If it means that namespace::xxx/parent::* returns all elements from
> the document sharing the namespace, it sounds interesting

It doesn't - it means that namespace nodes don't have a parent. I do
have stylesheets that use the parents of namespace nodes, but only in
the context of trying to work out the namespace URI of a qualified
name, which can now be handled much more easily using
xf:get-namespace-uri().

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
Received on Wednesday, 26 December 2001 12:51:59 UTC