- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Mon, 13 May 2002 12:01:57 +0100
- To: Jonathan Robie <jonathan.robie@datadirect-technologies.com>
- CC: public-qt-comments@w3.org
Hi Jonathan,
I promised some more detailed comments on the F&O WD, so here you are.
As usual, these come from my perspective as an XSLT user rather than
anything else. I've ignored the constructors and casting sections,
since I know they're under review anyway.
I guess my guiding principal is that if a function is just a shorthand
for something that can be implemented without a recursive function,
then it shouldn't be included in the core set of XPath 2.0 functions.
Both XQuery and XSLT have methods of defining extension functions, so
I think that it's more important to focus on the functions that are
impossible or difficult to implement in XQuery/XSLT rather than those
that are simply convenience functions.
Cheers,
Jeni
---
The new functions, added on to XPath 1.0, are the following. I've put
* by the ones that I think should stay, - by those that I think should
go, and + by those on which I'm equivocal:
- node-kind() -- I've hardly ever seen a problem that's required
this functionality. I think it would be more flexible to use the
"instance of" operator to work out what kind of node you're
looking at; it would be easy enough to define your own function to
give you the name of the node type on the rare cases that's
required. In other words, you should be able to get at the type of
a node and the type of an atomic value in the same way.
+ node-name() -- This used to be the name() function; I wonder
whether it would be possible to merge this with the name()
function. It would be great if that could be done so that the
name() function works in the way that people think it works, such
that "name() = 'pfx:name'" is equivalent to "self::pfx:name"; this
would be backwards-incompatible with XPath 1.0, but would be more
intuitive for users.
* data() -- Certainly required now, but as with a lot of these
functions, I wonder whether it would be helpful to have it follow
the pattern of existing functions, like name() and string(), and
have it return the typed value of the context node if it doesn't
have an argument passed to it. I know that the F&O document
purposefully tries to avoid overloaded functions, but for users,
both those used to XPath 1.0 and those coming new to XPath 2.0, it
will be confusing that different functions work in different ways
depending on which version they were introduced in.
* base-uri() -- Certainly very useful; we often get questions asking
how to get the URL of the file that's being used as the source of
the transformation.
- unique-ID() -- I've never known anyone to have to get hold of the
value of the ID attribute on a given element. If they do, they
know the name of the attribute and can get its value through
normal mechanisms. I'm also worried that this function will get
confused with the generate-id() function.
* compare() -- We do need this facility although not as much as
you might think, in my opinion. I have to say that personally I
find a return value of -1, 0 or 1 difficult to work with: I always
get confused about which way round the arguments are related. It
would be great if there was an alternative design, but I doubt
that there is and since we'll rarely have to use different
collations, I don't think that's too much of a problem.
- normalize-unicode() -- As far as I understand the character
model for the WWW, all text on the Internet should be normalized,
and specifications should require unicode normalized (NFC) text. I
can't recall ever seeing someone need to do unicode normalization;
I suspect that such operations would be better done at a lower
level in the application (normalize early) and that the data model
should dictate that text is normalized.
* upper-case() and lower-case() -- There's definitely a strong
requirement for these, although allowing case-insensitive
comparisons (which I think is supported with collations?) will go
most of the way towards supporting the usual reason for
case-changing. As I think I might have mentioned before, I believe
that technically there should be a title-case() function as well,
since the title case version of a letter is not always the same as
the upper case version of a letter (ref.
http://www.unicode.org/unicode/reports/tr21/)
+ string-pad() -- Repeating the same string is a fairly common
operation, although it is one that's particularly easy to
accomplish now with a user-defined function and a simple
iteration. I therefore don't think that this function is vital,
and if you want to save space, I think it should be dropped.
* match() and replace() -- I think that you know that we need more
regular expression support than this; I believe that you're
working on that and that I've already commented on it.
+ duration/dateTime functions -- I've already commented on these in
a separate thread. I think that this is the poorest section of the
spec. The kinds of things that people want to do with dates are:
- reformat them (which I believe is being supported separately
in XSLT 2.0, though it's not there yet)
- get a date from the common "seconds since
1970-01-01T00:00:00Z" representation (for all its faults)
- perform calculations between them
Dates have a fixed format, so it's not hard to extract individual
components from a date; I don't think that the set of functions to
do so are necessary. It's harder to extract information from a
duration because it doesn't have a fixed format, but not
drastically so, and I think it's really very rare that you need to
know get that kind of information from a duration.
One thing that *is* difficult, and is useful, is to get values
like "the number of seconds represented by this duration" (i.e.
the reverse of dayTimeDuration-from-seconds()) -- it's useful
because that enables you to perform calculations with durations
(adding them, dividing them) that you can't do otherwise.
* get-local-name() and get-namespace-uri() -- Makes me wish that
the structured data types such as QNames, dates, durations and so
on could be treated as virtual elements, so you could do
$qname/local-name or $date/year. These are certainly handy
functions, though.
* resolve-URI() -- I imagine this will be very handy.
URI manipulation is, I think, the primary reason for the
requirement for string manipulation functions like
subtring-after-last() or index-of-last(). Perhaps a
get-file-name() method would be useful; I'm not sure.
+ deep-equal() -- I wouldn't personally say that this was a
high-priority function. My guess would be that people would use it
for the common task of moving through two documents to see where
differences lie between them, and in that context I think it would
be very expensive. But others might have use cases that I'm
unaware of.
- root() -- I think that root($node) does the same thing as
$node/ancestor::node()[last()]. Given that the function is
possible with very little effort, and that you rarely need to get
from a node to the root node of that document, I don't really see
the point of this function.
- if-absent() and if-empty() are shorthands for:
if (not($node)) then $default else $node and
if (not($node) or not($node/node())) then $default else $node
I don't find these expressions so burdensome that they require
shorthand functions, especially not compared to some of the other
functionality that's currently missing from the spec.
* index-of() -- definitely required, though I have no doubt that
people will use it like:
$nodes[index-of(for $n in $nodes return string(), 'foo')]
- empty() -- empty($seq) seems to be equivalent to
not(boolean($seq)); as with other shorthands for easy expressions,
I don't think this one's necessary, although it's true that the
casting of empty sequences to boolean false can be non-obvious for
beginners.
- exists() -- seems to be equivalent to not(empty($seq)) or exactly
equivalent to boolean($seq). I don't think this is necessary;
empty() is more useful if you didn't want to use boolean() in the
way that it's been used in XPath 1.0.
+ distinct-nodes() -- This obviously doesn't arise in XSLT 1.0
because it's impossible to create a node set that contains more
than one of a particular node. Given that node sequences are (or
should/can be) created with duplicates automatically removed, I
doubt that this will come into play very often; there aren't any
use cases for it in the XQuery use case document either. On the
other hand, the equivalent expression (distinct-nodes($nodes) is
the same as union(() | $nodes)) is a bit of a hack and might not
get you precisely what you want (since it also reorders into
document order), so it's probably best to be on the safe side.
* distinct-values() -- This functionality is required (and
lacking) in XSLT 1.0, but the grouping facilities in XSLT 2.0 mean
that it wouldn't be nearly as important there. I can see places
where it would be handy, though (for example to write things like
"there are 4 groups...", and to allow me to apply templates to
distinct nodes in order to get more flexibility in my stylesheet).
Since this function is likely to be much more heavily used than
distinct-nodes(), I think it should be shortened to distinct().
- insert() -- I can't really see the point, given that there's a
concat(), a subsequence() and an index-of() and I don't think that
there will often be times when you need to insert items into the
middle of a sequence.
- remove() -- Again, I don't see why this is needed, given that you
can use a predicate to do the same thing: $target[position() !=
$position].
* subsequence() -- I imagine would be useful.
+ sequence-deep-equal() and sequence-node-equal() -- I'm not sure
about sequence-deep-equal(), for the same reason I'm not sure
about deep-equal(). The most useful, I would imagine, would be a
plain sequence-equal() that compared the two sequences to see if
they were the same on an item-by-item basis, with nodes being
assessed based on identity, and values being assessed on their
value.
- avg() -- I'm not personally convinced (since the equivalent
expression of sum() div count() really isn't difficult).
* max() and min() -- Definitely. This is a requirements that's
probably even greater than date formatting or regular expressions.
It would be even more helpful if there was a quick way of getting
to the node(s) that has the min/max value, rather than just
getting the value itself. I imagine we're going to see rather a
lot of $nodes[. = max($nodes)] otherwise, although I guess that
could be optimised.
- idref() -- As I've said elsewhere, id() turns out to be hardly
used in XSLT because of the issues to do with requiring a DTD be
present for the link to be any use. Where you need a reverse link,
you can generally set up a key instead. I'd rather see keys from
XML Schema supported than a specific idref() function introduced.
- filter() -- I think this is potentially very useful, but, like
copy() and shallow(), it has to do with creating nodes, which
means that it shouldn't live at the XPath level.
- collection() -- I don't really understand how this is different
from the document() function.
* input() -- Sounds reasonable.
- context-item() -- I assume that this is not a real function, but
actually just a backup for the shorthand '.'? It should say so.
* current-dateTime() -- Definitely required; XForms calls this
function now(), which has the advantage of being short and
avoiding the mixed case convention difficulties.
Aside from those mentioned above, functions that are missing are:
* tokenize(), which people ask for all the time, particularly for
splitting strings into lines or words
+ possibly sqrt(), sin() and cos(), which are particularly useful
when creating graphic formats such as SVG and aren't that easy to
implement in XSLT
* random() (create random numbers) and more usefully, I think,
randomize() (randomly alter the order of items in a sequence),
both with obvious side-effect issues; again these are impossible
to implement using XSLT
* function-available() to support the idea that XPath function
libraries could be provided by particular implementations.
* system-property() to support getting information about the XPath
implementation version and so on.
FWIW, on the issues front:
14: (operator-function-signatures) I agree, some of the
signatures are confusing; I read the spec as indicating the
required types for the functions, such that if you're using
XPath the casting to those types is done automatically.
20: (operator-codepoint-vs-character) I agree that the spec
should be clear about whether it's talking about code points or
characters, but I think that the character model spec recommends
talking about character strings rather than code unit strings
(ref. http://www.w3.org/TR/charmod/#sec-Strings)
21: (operator-function-return-types) In my opinion, the return
type of a function should be fixed, and not change based on the
actual type passed as the argument of a function.
37: (semantic-contains) I think that adding linguistic/semantic
contains is a huge effort for very little benefit, at least for
XPath 2.0. I can see that XQuery might want it, but I wouldn't
want XSLT to be burdened, as the primary task of XSLT is
transformation rather than querying.
44: (operator-collation-specification) I think that XPath 2.0 should
follow the pattern of XPath/XSLT 1.0 and use qualified names
rather than URIs, for consistency and because it makes them
easier to use.
63: (operator-augment-index-of) I find the distinction between
performing operations on nodes vs. performing operations on
their values fiddly. In the case of index-of(), it strikes me
that it wouldn't be difficult to perform index-of-value() if you
had support for an index-of() that matched by node identity or
simple type value (by creating a sequence of the node values and
getting the index of the value you were after).
66: (operator-docorder-function) Like distinct-nodes(), the
requirement (or lack of it) for this function isn't yet apparent
because it's not an issue in XSLT 1.0. Personally, I don't think
that it will be used that often, but it may be best to be on the
safe side as it wouldn't be particularly easy to replicate this
functionality without removing duplicate nodes at the same time.
67: (operator-remove-dupes) Since location paths do remove
duplicates, and there thus isn't any backwards incompatibility
with XPath 1.0, I don't think there's any reason for count() or
sum() to remove duplicates.
73: (operator-compare-between) I don't think that a
compare-between() function is required.
77: (operator-string-from-char) chars aren't data types in XML
Schema -- are they in XPath? If not, then this issue isn't
relevant.
94: (operator-within-window) As with (semantic-contains), I don't
think this is a high priority for XPath 2.0.
108: (operators-always-normalize) I don't think that we should need
to worry about unicode normalization within XPath 2.0.
136: (function-datetime-timezone-conversion) In XML Schema, the
timezone isn't part of the value space of a dateTime. Adding a
timezone to a dateTime is essentially a formatting function.
139: (need-fuller-definition-of-error-behavior-and-handling) Yes. We
need to be able to test if an item is an error, and then be able
to get information about that error, most importantly an error
message that describes it and probably some information about
the context in which the error occurred (e.g. what the context
node was). I'm sure that you already have something on the cards
here. Another point of confusion is that the empty sequence is
sometimes used as a kind of error value, but at other times an
error object is returned. I haven't yet worked out what the
underlying heuristic is there, assuming that there is one.
141: (does-string-equality-use-codepoint-or-default-collation) I
think it should use the default collation, like the other string
manipulation functions.
142: (what-should-floor-ceiling-round-return) For compatibility, this
should really return a xs:double (I believe). However, I think
that returning an xs:integer, with an empty sequence used
instead of NaN, would also be reasonable.
143: (need-tokenize-function) As above, we definitely need a
tokenize() function, preferably one that defaults to breaking on
whitespace.
144: (should-concat-accept-sequence-arguments) It would be useful,
but highly incompatible. Perhaps a separate concat-sequence()
function should be invented. (In XSLT 2.0, you can achieve
the same effect with an xsl:value-of and an empty separator
attribute, but since XSLT shouldn't be used for general sequence
construction (apparently), this isn't ideal.
150: (should-comparison-that-return-indeterminate-results-be-supported)
As I've said before, yes. This is far more important than
supporting matching of 'nearby' strings and so on, in my
opinion.
151: (comparison-functions-for-other-date-and-time-types) Yes, there
should be comparison functions for other date and time types,
although a basic rule about how the comparisons are carried out
would be better than listing every possible combination of
comparisons.
152: (parameterized-extraction-functions-for-date-and-times) I view
the extraction functions as superfluous, in the face of
substring() and the prospect of a format-date() function. If you
have them here, then I do think that they should be
parameterised.
154: (second-order-distinct-function) Like the other second-order
functions, it would be great, but I don't think it's worth
entering that territory at this stage.
157: (boolean-from-string-legal-literals) Absolutely.
162: (can-the-node-parameter-to-root-be-omitted) As I mentioned
above, I think that having single-argument functions default to
using the context item is a very useful tactic, and one that
XPath 1.0 users are used to exploiting. It would be good, for
consistency, if the new functions supported this shorthand.
164: (for-complex-types-what-should-data-return) I don't have a
strong opinion either way, but it should be consistent with the
description of the typed value accessor in the data model. Since
the string value is readily accessible in other ways, I think
data() should probably not return the string value of the
element if it has a complex type with complex content.
166: (current-dateTime-convenience-functions) On the principal of
having as few functions as possible, I don't think these
convenience functions are necessary. They are easy to define for
people who want them.
168: (should-id-take-a-list-of-strings) id() definitely should be
compatible with id() in XPath 1.0, and therefore accept a list
of IDs.
---
Jeni Tennison
http://www.jenitennison.com/
Received on Monday, 13 May 2002 07:02:00 UTC