RE: F&O WD from Ashok Malhotra on 2002-05-14 (public-qt-comments@w3.org from May 2002)

From: Ashok Malhotra <ashokma@microsoft.com>
Date: Tue, 14 May 2002 08:58:42 -0700
To: "Jeni Tennison" <jeni@jenitennison.com>, "Jonathan Robie" <jonathan.robie@datadirect-technologies.com>
Cc: <public-qt-comments@w3.org>
Message-ID: <E5B814702B65CB4DA51644580E4853FB014887EA@red-msg-12.redmond.corp.microsoft.com>
Jeni:
Thanks for your comments.  We are all at a W3C meeting this week.  Will get back to you next week.
Ashok

 -----Original Message----- 
 From: Jeni Tennison [mailto:jeni@jenitennison.com] 
 Sent: Mon 5/13/2002 4:01 AM 
 To: Jonathan Robie 
 Cc: public-qt-comments@w3.org 
 Subject: F&O WD
 
 

 Hi Jonathan,
 
 I promised some more detailed comments on the F&O WD, so here you are.
 As usual, these come from my perspective as an XSLT user rather than
 anything else. I've ignored the constructors and casting sections,
 since I know they're under review anyway.
 
 I guess my guiding principal is that if a function is just a shorthand
 for something that can be implemented without a recursive function,
 then it shouldn't be included in the core set of XPath 2.0 functions.
 Both XQuery and XSLT have methods of defining extension functions, so
 I think that it's more important to focus on the functions that are
 impossible or difficult to implement in XQuery/XSLT rather than those
 that are simply convenience functions.
 
 Cheers,
 
 Jeni
 
 ---
 
 The new functions, added on to XPath 1.0, are the following. I've put
 * by the ones that I think should stay, - by those that I think should
 go, and + by those on which I'm equivocal:
 
   - node-kind() -- I've hardly ever seen a problem that's required
     this functionality. I think it would be more flexible to use the
     "instance of" operator to work out what kind of node you're
     looking at; it would be easy enough to define your own function to
     give you the name of the node type on the rare cases that's
     required. In other words, you should be able to get at the type of
     a node and the type of an atomic value in the same way.
  
   + node-name() -- This used to be the name() function; I wonder
     whether it would be possible to merge this with the name()
     function. It would be great if that could be done so that the
     name() function works in the way that people think it works, such
     that "name() = 'pfx:name'" is equivalent to "self::pfx:name"; this
     would be backwards-incompatible with XPath 1.0, but would be more
     intuitive for users.
 
   * data() -- Certainly required now, but as with a lot of these
     functions, I wonder whether it would be helpful to have it follow
     the pattern of existing functions, like name() and string(), and
     have it return the typed value of the context node if it doesn't
     have an argument passed to it. I know that the F&O document
     purposefully tries to avoid overloaded functions, but for users,
     both those used to XPath 1.0 and those coming new to XPath 2.0, it
     will be confusing that different functions work in different ways
     depending on which version they were introduced in.
 
   * base-uri() -- Certainly very useful; we often get questions asking
     how to get the URL of the file that's being used as the source of
     the transformation.
 
   - unique-ID() -- I've never known anyone to have to get hold of the
     value of the ID attribute on a given element. If they do, they
     know the name of the attribute and can get its value through
     normal mechanisms. I'm also worried that this function will get
     confused with the generate-id() function.
 
   * compare() -- We do need this facility although not as much as
     you might think, in my opinion. I have to say that personally I
     find a return value of -1, 0 or 1 difficult to work with: I always
     get confused about which way round the arguments are related. It
     would be great if there was an alternative design, but I doubt
     that there is and since we'll rarely have to use different
     collations, I don't think that's too much of a problem.
 
   - normalize-unicode() -- As far as I understand the character
     model for the WWW, all text on the Internet should be normalized,
     and specifications should require unicode normalized (NFC) text. I
     can't recall ever seeing someone need to do unicode normalization;
     I suspect that such operations would be better done at a lower
     level in the application (normalize early) and that the data model
     should dictate that text is normalized.
 
   * upper-case() and lower-case() -- There's definitely a strong
     requirement for these, although allowing case-insensitive
     comparisons (which I think is supported with collations?) will go
     most of the way towards supporting the usual reason for
     case-changing. As I think I might have mentioned before, I believe
     that technically there should be a title-case() function as well,
     since the title case version of a letter is not always the same as
     the upper case version of a letter (ref.
     http://www.unicode.org/unicode/reports/tr21/)
 
   + string-pad() -- Repeating the same string is a fairly common
     operation, although it is one that's particularly easy to
     accomplish now with a user-defined function and a simple
     iteration. I therefore don't think that this function is vital,
     and if you want to save space, I think it should be dropped.
 
   * match() and replace() -- I think that you know that we need more
     regular expression support than this; I believe that you're
     working on that and that I've already commented on it.
 
   + duration/dateTime functions -- I've already commented on these in
     a separate thread. I think that this is the poorest section of the
     spec. The kinds of things that people want to do with dates are:
 
       - reformat them (which I believe is being supported separately
         in XSLT 2.0, though it's not there yet)
 
       - get a date from the common "seconds since
         1970-01-01T00:00:00Z" representation (for all its faults)
 
       - perform calculations between them
 
     Dates have a fixed format, so it's not hard to extract individual
     components from a date; I don't think that the set of functions to
     do so are necessary. It's harder to extract information from a
     duration because it doesn't have a fixed format, but not
     drastically so, and I think it's really very rare that you need to
     know get that kind of information from a duration.
 
     One thing that *is* difficult, and is useful, is to get values
     like "the number of seconds represented by this duration" (i.e.
     the reverse of dayTimeDuration-from-seconds()) -- it's useful
     because that enables you to perform calculations with durations
     (adding them, dividing them) that you can't do otherwise.
 
   * get-local-name() and get-namespace-uri() -- Makes me wish that
     the structured data types such as QNames, dates, durations and so
     on could be treated as virtual elements, so you could do
     $qname/local-name or $date/year. These are certainly handy
     functions, though.
 
   * resolve-URI() -- I imagine this will be very handy.
 
     URI manipulation is, I think, the primary reason for the
     requirement for string manipulation functions like
     subtring-after-last() or index-of-last(). Perhaps a
     get-file-name() method would be useful; I'm not sure.
 
   + deep-equal() -- I wouldn't personally say that this was a
     high-priority function. My guess would be that people would use it
     for the common task of moving through two documents to see where
     differences lie between them, and in that context I think it would
     be very expensive. But others might have use cases that I'm
     unaware of.
 
   - root() -- I think that root($node) does the same thing as
     $node/ancestor::node()[last()]. Given that the function is
     possible with very little effort, and that you rarely need to get
     from a node to the root node of that document, I don't really see
     the point of this function.
 
   - if-absent() and if-empty() are shorthands for:
     if (not($node)) then $default else $node and
     if (not($node) or not($node/node())) then $default else $node
     I don't find these expressions so burdensome that they require
     shorthand functions, especially not compared to some of the other
     functionality that's currently missing from the spec.
 
   * index-of() -- definitely required, though I have no doubt that
     people will use it like:
 
     $nodes[index-of(for $n in $nodes return string(), 'foo')]
 
   - empty() -- empty($seq) seems to be equivalent to
     not(boolean($seq)); as with other shorthands for easy expressions,
     I don't think this one's necessary, although it's true that the
     casting of empty sequences to boolean false can be non-obvious for
     beginners.
 
   - exists() -- seems to be equivalent to not(empty($seq)) or exactly
     equivalent to boolean($seq). I don't think this is necessary;
     empty() is more useful if you didn't want to use boolean() in the
     way that it's been used in XPath 1.0.
 
   + distinct-nodes() -- This obviously doesn't arise in XSLT 1.0
     because it's impossible to create a node set that contains more
     than one of a particular node. Given that node sequences are (or
     should/can be) created with duplicates automatically removed, I
     doubt that this will come into play very often; there aren't any
     use cases for it in the XQuery use case document either. On the
     other hand, the equivalent expression (distinct-nodes($nodes) is
     the same as union(() | $nodes)) is a bit of a hack and might not
     get you precisely what you want (since it also reorders into
     document order), so it's probably best to be on the safe side.
 
   * distinct-values() -- This functionality is required (and
     lacking) in XSLT 1.0, but the grouping facilities in XSLT 2.0 mean
     that it wouldn't be nearly as important there. I can see places
     where it would be handy, though (for example to write things like
     "there are 4 groups...", and to allow me to apply templates to
     distinct nodes in order to get more flexibility in my stylesheet).
     Since this function is likely to be much more heavily used than
     distinct-nodes(), I think it should be shortened to distinct().
 
   - insert() -- I can't really see the point, given that there's a
     concat(), a subsequence() and an index-of() and I don't think that
     there will often be times when you need to insert items into the
     middle of a sequence.
 
   - remove() -- Again, I don't see why this is needed, given that you
     can use a predicate to do the same thing: $target[position() !=
     $position].
 
   * subsequence() -- I imagine would be useful.
 
   + sequence-deep-equal() and sequence-node-equal() -- I'm not sure
     about sequence-deep-equal(), for the same reason I'm not sure
     about deep-equal(). The most useful, I would imagine, would be a
     plain sequence-equal() that compared the two sequences to see if
     they were the same on an item-by-item basis, with nodes being
     assessed based on identity, and values being assessed on their
     value.
 
   - avg() -- I'm not personally convinced (since the equivalent
     expression of sum() div count() really isn't difficult).
 
   * max() and min() -- Definitely. This is a requirements that's
     probably even greater than date formatting or regular expressions.
     It would be even more helpful if there was a quick way of getting
     to the node(s) that has the min/max value, rather than just
     getting the value itself. I imagine we're going to see rather a
     lot of $nodes[. = max($nodes)] otherwise, although I guess that
     could be optimised.
 
   - idref() -- As I've said elsewhere, id() turns out to be hardly
     used in XSLT because of the issues to do with requiring a DTD be
     present for the link to be any use. Where you need a reverse link,
     you can generally set up a key instead. I'd rather see keys from
     XML Schema supported than a specific idref() function introduced.
 
   - filter() -- I think this is potentially very useful, but, like
     copy() and shallow(), it has to do with creating nodes, which
     means that it shouldn't live at the XPath level.
 
   - collection() -- I don't really understand how this is different
     from the document() function.
 
   * input() -- Sounds reasonable.
 
   - context-item() -- I assume that this is not a real function, but
     actually just a backup for the shorthand '.'? It should say so.
 
   * current-dateTime() -- Definitely required; XForms calls this
     function now(), which has the advantage of being short and
     avoiding the mixed case convention difficulties.
 
 Aside from those mentioned above, functions that are missing are:
 
   * tokenize(), which people ask for all the time, particularly for
     splitting strings into lines or words
 
   + possibly sqrt(), sin() and cos(), which are particularly useful
     when creating graphic formats such as SVG and aren't that easy to
     implement in XSLT
 
   * random() (create random numbers) and more usefully, I think,
     randomize() (randomly alter the order of items in a sequence),
     both with obvious side-effect issues; again these are impossible
     to implement using XSLT
 
   * function-available() to support the idea that XPath function
     libraries could be provided by particular implementations.
 
   * system-property() to support getting information about the XPath
     implementation version and so on.
 
 FWIW, on the issues front:
 
   14: (operator-function-signatures) I agree, some of the
       signatures are confusing; I read the spec as indicating the
       required types for the functions, such that if you're using
       XPath the casting to those types is done automatically.
 
   20: (operator-codepoint-vs-character) I agree that the spec
       should be clear about whether it's talking about code points or
       characters, but I think that the character model spec recommends
       talking about character strings rather than code unit strings
       (ref. http://www.w3.org/TR/charmod/#sec-Strings)
 
   21: (operator-function-return-types) In my opinion, the return
       type of a function should be fixed, and not change based on the
       actual type passed as the argument of a function.
 
   37: (semantic-contains) I think that adding linguistic/semantic
       contains is a huge effort for very little benefit, at least for
       XPath 2.0. I can see that XQuery might want it, but I wouldn't
       want XSLT to be burdened, as the primary task of XSLT is
       transformation rather than querying.
 
   44: (operator-collation-specification) I think that XPath 2.0 should
       follow the pattern of XPath/XSLT 1.0 and use qualified names
       rather than URIs, for consistency and because it makes them
       easier to use.
 
   63: (operator-augment-index-of) I find the distinction between
       performing operations on nodes vs. performing operations on
       their values fiddly. In the case of index-of(), it strikes me
       that it wouldn't be difficult to perform index-of-value() if you
       had support for an index-of() that matched by node identity or
       simple type value (by creating a sequence of the node values and
       getting the index of the value you were after).
 
   66: (operator-docorder-function) Like distinct-nodes(), the
       requirement (or lack of it) for this function isn't yet apparent
       because it's not an issue in XSLT 1.0. Personally, I don't think
       that it will be used that often, but it may be best to be on the
       safe side as it wouldn't be particularly easy to replicate this
       functionality without removing duplicate nodes at the same time.
 
   67: (operator-remove-dupes) Since location paths do remove
       duplicates, and there thus isn't any backwards incompatibility
       with XPath 1.0, I don't think there's any reason for count() or
       sum() to remove duplicates.
 
   73: (operator-compare-between) I don't think that a
       compare-between() function is required.
 
   77: (operator-string-from-char) chars aren't data types in XML
       Schema -- are they in XPath? If not, then this issue isn't
       relevant.
 
   94: (operator-within-window) As with (semantic-contains), I don't
       think this is a high priority for XPath 2.0.
 
  108: (operators-always-normalize) I don't think that we should need
       to worry about unicode normalization within XPath 2.0.
 
  136: (function-datetime-timezone-conversion) In XML Schema, the
       timezone isn't part of the value space of a dateTime. Adding a
       timezone to a dateTime is essentially a formatting function.
 
  139: (need-fuller-definition-of-error-behavior-and-handling) Yes. We
       need to be able to test if an item is an error, and then be able
       to get information about that error, most importantly an error
       message that describes it and probably some information about
       the context in which the error occurred (e.g. what the context
       node was). I'm sure that you already have something on the cards
       here. Another point of confusion is that the empty sequence is
       sometimes used as a kind of error value, but at other times an
       error object is returned. I haven't yet worked out what the
       underlying heuristic is there, assuming that there is one.
 
  141: (does-string-equality-use-codepoint-or-default-collation) I
       think it should use the default collation, like the other string
       manipulation functions.
 
  142: (what-should-floor-ceiling-round-return) For compatibility, this
       should really return a xs:double (I believe). However, I think
       that returning an xs:integer, with an empty sequence used
       instead of NaN, would also be reasonable.
 
  143: (need-tokenize-function) As above, we definitely need a
       tokenize() function, preferably one that defaults to breaking on
       whitespace.
 
  144: (should-concat-accept-sequence-arguments) It would be useful,
       but highly incompatible. Perhaps a separate concat-sequence()
       function should be invented. (In XSLT 2.0, you can achieve
       the same effect with an xsl:value-of and an empty separator
       attribute, but since XSLT shouldn't be used for general sequence
       construction (apparently), this isn't ideal.
 
  150: (should-comparison-that-return-indeterminate-results-be-supported)
       As I've said before, yes. This is far more important than
       supporting matching of 'nearby' strings and so on, in my
       opinion.
 
  151: (comparison-functions-for-other-date-and-time-types) Yes, there
       should be comparison functions for other date and time types,
       although a basic rule about how the comparisons are carried out
       would be better than listing every possible combination of
       comparisons.
 
  152: (parameterized-extraction-functions-for-date-and-times) I view
       the extraction functions as superfluous, in the face of
       substring() and the prospect of a format-date() function. If you
       have them here, then I do think that they should be
       parameterised.
 
  154: (second-order-distinct-function) Like the other second-order
       functions, it would be great, but I don't think it's worth
       entering that territory at this stage.
 
  157: (boolean-from-string-legal-literals) Absolutely.
 
  162: (can-the-node-parameter-to-root-be-omitted) As I mentioned
       above, I think that having single-argument functions default to
       using the context item is a very useful tactic, and one that
       XPath 1.0 users are used to exploiting. It would be good, for
       consistency, if the new functions supported this shorthand.
 
  164: (for-complex-types-what-should-data-return) I don't have a
       strong opinion either way, but it should be consistent with the
       description of the typed value accessor in the data model. Since
       the string value is readily accessible in other ways, I think
       data() should probably not return the string value of the
       element if it has a complex type with complex content.
 
  166: (current-dateTime-convenience-functions) On the principal of
       having as few functions as possible, I don't think these
       convenience functions are necessary. They are easy to define for
       people who want them.
 
  168: (should-id-take-a-list-of-strings) id() definitely should be
       compatible with id() in XPath 1.0, and therefore accept a list
       of IDs.
 
 ---
 Jeni Tennison
 http://www.jenitennison.com/
Received on Tuesday, 14 May 2002 11:59:15 UTC