RE: Comments on Functions and Operators (30th April) from Kay, Michael on 2002-05-21 (public-qt-comments@w3.org from May 2002)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Tue, 21 May 2002 13:43:27 +0200
To: David Carlisle <davidc@nag.co.uk>, public-qt-comments@w3.org
Message-ID: <DFF2AC9E3583D511A21F0008C7E6210602679DDC@daemsg02.software-ag.de>
Thanks, David. Just to keep you up to date:

(a) we are close to accepting a proposal that unifies casts and constructor
functions, so hopefully this stuff about constructors requiring a string
literal will disappear.

(b) we have started work on a more precise definition of arithmetic,
covering the promotion rules, the returned types, scale and precision of
results, effect of overflow and underflow etc - but there is a lot of work
still to be done.

(c) we have decided to drop some of the more obscure functions such as
xf:shallow() and xf:filter()

I did look into the question of defining contains() and startsWith() etc
with respect to a collation. It works provided the collation has certain
properties, in effect it must be a "character by character" collation - it
won't work on a "semantic" collation that sorts January before February or
iso646 before iso2022. We have to decide how much detail we want to specify
on this.

I quite like the idea of distinguishing core functions from utility
functions, my approach to this would be leave the utility functions in the
same spec (we have too many documents already) but specify their effect
using an actual code implementation.

Michael Kay

> -----Original Message-----
> From: David Carlisle [mailto:davidc@nag.co.uk] 
> Sent: 21 May 2002 12:11
> To: public-qt-comments@w3.org
> Subject: Comments on Functions and Operators (30th April)
> 
> 
> 
> [using public-qt-comments  as some WG members have indicated 
> that was the intended comment list for F&O in common with the 
> other documents in this round]
> 
> 
> 
> 2.6 xf:unique-ID
> 
>    I agree with others that there seems little point in this 
> function.  If
>    for some reason you needed the functionality, it is just 
> something like
>    @*[. instance of xs:ID] isn't it?
> 
> 
> 3.2 Numeric Constructors
> 
>    Having xs:byte and friends is almost (but not quite) reasonable in
>    schema where it provides a shorthand to constrain integer values to
>    set ranges (if they are the ranges you happen to want) but I see no
>    reason not to map all these to a single numeric type (or
>    integer/float distinction at most) within Xpath. Standard Xpath
>    constructs can be used to ensure that any calculated value 
> is within
>    range. (However I'll assume in the comments below that all 
> the types
>    are kept)
> 
> 
> 3.2.1 Returns the decimal value that is represented by the characters
>     contained in the value of $srcval. For this constructor, $srcval
>     must be a string literal. 
> 
>    I commented on this before, as did others I believe. 
> restricting the
>    constructors to literals, especially if they use a 
> function syntax is
>    unreasonable. Arguments involving optimisation don't really hold as
>    any optimising compiler must be able to spot when a 
> function argument
>    is literal and so optimise away the function call.  (This is a
>    generic comment for all constructors, I only comment on this one.)
> 
> 
> 3.3 Operators on Numeric Values
> The arguments and return types for the arithmetic operators 
> are the basic numeric types: decimal, float, and double and 
> types derived from them. For simplicity, each operator is 
> defined to operate on operands of the same datatype and to 
> return the same datatype 
> 
> 
> The type promotion scheme includes only two rules: 
> 
> A derived type may be promoted its base type.
> 
> decimal may be promoted to float, and float may be promoted 
> to double. 
> 
> 
> 
>    It is not clear to me from that wording if the arguments 
> are _always_
>    promoted to their base type. The use of "may" suggests not but the
>    examples involving integer being promoted to decimal suggest yes.
>    What happens if the two arguments are the same (non-base) type?
> 
>    In particular I can not tell if 5 div 3 is first converted
>    to 5.0 div 3.0 (and so compatible with XPath 1) or, as in earlier
>    drafts of this document does integer division returning an integer.
> 
> 
>    similarly
> 
> 3.3.5.1 Examples
> op:numeric-mod(10,3) returns 1.
> 
>     If 10 and 3 there are integers (which I think they are?) then it
>     should be made clear that 1 is not of integer type (if it isn't).
> 
> 3.3.7 op:numeric-unary-minus
>     should this specify what happens on NaN (presumably returns NaN?)
> 
> 
> 
> 3.4.1 op:numeric-equal
> op:numeric-equal(numeric $operand1, numeric $operand2) => 
> boolean Returns true if and only if $operand1 is exactly 
> equal to $operand2. This function backs up the "eq" and "ne" 
> operators on numeric values. 
> 
>    As Jeni commented in here comments on this draft, it's not 
> clear that
>    eq is particularly useful at the surface Xpath syntax (as 
> opposed to
>    the semantics), if it were dropped the last phrase would need
>    replacing with one referring to = and !=.
> 
> 
> 
> 4.1 String Types
> The operators described in this section are defined on the 
> following string types.
> 
> string 
>  normalizedString 
> 
>    As I commented on the last draft and above for the numeric 
> types the
>    justification for copying over all the schema derived types into
>    XPath is rather weak. 
> 
>    normalizedString is a particularly bad example it is just 
> a subset of
>    string, in many uses of Xpath (XSLT for example) it's 
> rather hard to
>    generate a string that _isn't_ in this subset.
> 
>    The purpose of the datatype in schema is to instruct the parser on
>    the mapping from the input characters to the value space, 
> but this is
>    just a parsing issue it has no use at all in XPath/XQuery.
> 
> 
> 
> 4.2.3 xf:token
>    Unless there are some useful functions defined on this 
> datatype that
>    are not available to strings, why would anyone use this? 
> 
> 
> 
> 4.2.8 xf:ID
> The semantic correctness of ID values (that they must be 
> unique within a given document) is not enforced by the xf:ID function.
>    
> 
>     Since this property is the only useful property that IDs 
> have, there
>     seems no point in this function, if values of type ID are not
>     guaranteed to be unique to a document  (as they are
>     effectively in XPath 1, even for invalid documents) then 
> you may as
>     well use string.
> 
> 
> 
> [Issue 144: Should the concat function accept sequences as 
> arguments? ] 
> 
>     Maybe you don't want to overload concat but a mapconcat that
>     takes the string value of every item in a sequence and 
> concatenates
>     the result would be useful (of course if you'd added proper higher
>     order function  support this could have been done with some
>     combination of fold and concat, but....
> 
> 
> 
> 4.4.5 xf:contains
>    How do collations work with respect to substring matching 
> as opposed
>    to ordering, if ss and sharp-s collate equal, is "s" a substring of
>    sharp-s? (I suspect this is ignorance on my part, but I'm probably
>    not the only one, some examples might help, as is done in 
> some detail
>    for collations and the comparison operators).
> 
> 
> 4.4.10.1 Examples
> xf:normalize-space(" hello world ") returns "hello world". 
> 
>    As came up recently on xsl-list it would be good to have an example
>    of   
>    xf:normalize-space("     ") returns "". 
> 
>   As some readers (surprisingly) managed to construe the XPath 1
>   description as allowing a return value of a space.
> 
> 
> 
> 
> > A "lower-case letter" is a character whose Unicode General Category 
> > class includes "Ll". The corresponding upper-case letter is 
> determined 
> > using [Unicode Case Mappings].
> 
>    In the unicode tables each character is unambiguously 
> upper or lower
>    case, but the mappings are locale-dependent.
>    so i is lowercase and
>    I and LATIN CAPITAL LETTER I WITH DOT ABOVE are both uppercase
>    but which of these is the the uppercase of i depends on 
> who you ask.
>    (probably it just depends on whether you are in Turkey, 
> but still...)
>    
> 
> 
>  4.4.15 xf:string-pad
> 
>     Would have been more useful in XPath 1 than it is in 2, where it
>     can trivially be defined as a user function.
> 
>     It would be better to take out all of these "shortcut functions"
>     from the core language and perhaps add them again as a "standard
>     library" of functions that are defined within Xquery (or/and
>     XSLT). Many other languages follow this model, have a 
> small core set
>     of functions but a larger collection of useful functions that are
>     provided to the end users but are (specified as being) 
> defined using
>     the language rather than being a core part of the language.  Of
>     course an implementation may or may not use the "obvious" 
> definition
>     in the language or a built in optimisation.
> 
> 
> 
> 4.4.17 xf:replace
> 
>     This doesn't even come close to what is needed, but I think
>     a future draft will have more on this...
> 
> 
> 
> 6.1 Duration, Date and Time Types
> 
>      This is all so unspeakably horrible. It's not all your 
> fault mind, 
>      starting from the mass of date related constructs in W3C 
> schema it
>      may be that nothing better could be done. (Sorry this isn't that
>      constructive a comment, I'm going to skip this section, 
> and didn't
>      want lack of comment to be taken as agreement that it was all
>      fine...)
> 
> 
> 
> 
> 9.1.2 op:base64-binary-equal
> 
>       Wouldn't equality on this type be more useful if it ignored
>       whitespace? That way equality would imply that you get the same
>       binary data if you decode the base64 encoding. If you 
> really want
>       to check the _encoded_ string is equal you can cast to 
> string, but
>       why would you want that?
> 
> 
> 
> 
> 11.1.6 xf:deep-equal
> 
>    Should not be provided as a core fuction, you always need 
> to write a
>    recursive function for the exact notion of equality that you need
>    (do attributes count?, namespaces? white space text nodes?)
>   
> 11.1.11 xf:copy
> 11.1.12 xf:shallow
> 
>    Should (at most) be in XQuery and not in the core XPath.
> 
> 
> 11.2.1 xf:if-absent
> 11.2.2 xf:if-empty
> 
>    Should not be provided in the core (at most it could be 
> provided in a
>    standard library) but since presumably the second argument 
> is evaluated
>    whether or not the first argument is true, a user would always (?)
>    be better advised to use  the conditional if expression.
> 
> 
> 12.2.5 xf:empty
> 12.2.6 xf:exists
> 
>    these are simply shorthands for simple expressions, should 
> not be in
>    the core.
> 
> 12.3.1 xf:sequence-deep-equal
> 
>    see previous comments on deep-equal
> 
> 
> [Issue 67: Should duplicates be eliminated for count() and sum()?] 
> 
>    No.
> 
> 12.4.2 xf:avg
> Values that equal the empty sequence are discarded.
> 
>   I think that that is just a special case of the (implied) operation
>   that what is constructed is the sequence of values of the data()
>   function.
>   the fact that empty sequences vanish is then just a 
> consequence of the
>   flattening rules for sequences, and this would then also 
> specify what
>   happens if some values are non-empty sequences, which is not fully
>   specified at present.
> 
> 
> 12.4.3 xf:max
>   As Jeni commented it would be useful if max could return the _item_
>   that has a max value rather than just its value.
> 
> 
> 12.4.5 xf:sum
>   As for count I think that the description on empty 
> sequences should be
>   generalised to any sequence. (Also I'm pleased to see that sum() now
>   returns 0 on () unlike the last draft)
> 
> 
> 12.5.1 xf:id
>   Have to have this of course for compatibility although 
> key() turns out
>   to be a lot more useful. (Which is why it is rather odd 
> that XPath2 has => op
>   which is even more restricted than id().)
> 
> 
> 12.5.2 xf:idref
>   Not clear that this is really useful.
> 
> 
> 12.5.3 xf:filter
> [Issue 167: Semantics of xf:filter are underspecified and, 
> perhaps, incorrect.] 
> 
>    This function probably should be removed.
> 
> David
> 
> _____________________________________________________________________
> This message has been checked for all known viruses by Star 
> Internet delivered through the MessageLabs Virus Scanning 
> Service. For further information visit 
> http://www.star.net.uk/stats.asp or > alternatively call Star 
> Internet for details on the Virus Scanning Service.
>
Received on Tuesday, 21 May 2002 07:43:37 UTC