RE: F&O WD from Kay, Michael on 2002-05-20 (public-qt-comments@w3.org from May 2002)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Mon, 20 May 2002 13:37:15 +0200
To: Jeni Tennison <jeni@jenitennison.com>, Jonathan Robie <jonathan.robie@datadirect-technologies.com>
Cc: public-qt-comments@w3.org
Message-ID: <DFF2AC9E3583D511A21F0008C7E6210602679DCB@daemsg02.software-ag.de>
> -----Original Message-----
> From: Jeni Tennison [mailto:jeni@jenitennison.com] 
> 
> I promised some more detailed comments on the F&O WD, so here 
> you are. 

Some personal responses from me below...
> 
>   - node-kind() -- I've hardly ever seen a problem that's required
>     this functionality. I think it would be more flexible to use the
>     "instance of" operator to work out what kind of node you're
>     looking at

I agree, "instance of" makes this redundant.
>   
>   + node-name() -- This used to be the name() function; I wonder
>     whether it would be possible to merge this with the name()
>     function.

We've tried this. If name() is changed to return a QName rather than a
string, then it has to return () rather than "" for an unnamed node, and
that stops things like child::node()[name()!='note'] from working in a
backwards-compatible way. So we decided we couldn't change name(); and it
does seem worth introducing a function that returns a QName so that equality
comparisons become namespace-sensitive.
> 
>   * data() -- Certainly required now, but as with a lot of these
>     functions, I wonder whether it would be helpful to have it follow
>     the pattern of existing functions, like name() and string(), and
>     have it return the typed value of the context node if it doesn't
>     have an argument passed to it.

I'm fairly neutral on this. I wish we had required the explicit argument on
existing functions.
> 
>   - unique-ID() -- I've never known anyone to have to get hold of the
>     value of the ID attribute on a given element.

I agree.
> 
>   * compare() -- We do need this facility although not as much as
>     you might think, in my opinion. 

I don't think we actually expect it to be needed that often, which is why a
slightly awkward return value is acceptable, but it is important for
internationalization.
> 
>   - normalize-unicode() -- As far as I understand the character
>     model for the WWW, all text on the Internet should be normalized,
>     and specifications should require unicode normalized (NFC) text.

The trouble is that word "should". And even if the source text is
normalized, that doesn't guarantee that the result of operations like
concat() and substring() is normalized. I don't have enough experience of
processing non-English text to know how important this function is, I think
we have to take advice from the I18N people who tell us that it is.
> 
>   * upper-case() and lower-case() -- There's definitely a strong
>     requirement for these, although allowing case-insensitive
>     comparisons (which I think is supported with collations?) will go
>     most of the way towards supporting the usual reason for
>     case-changing.

I think there are enough other applications to justify their retention.

>     As I think I might have mentioned before, I believe
>     that technically there should be a title-case() function as well,

I Have Always Thought Title Case Was A Style Used Only By American
Newspapers.
> 
>   + string-pad() -- Repeating the same string is a fairly common
>     operation, although it is one that's particularly easy to
>     accomplish now with a user-defined function and a simple
>     iteration. I therefore don't think that this function is vital,
>     and if you want to save space, I think it should be dropped.

I think it's sufficiently handy to be worth keeping.
> 
>   * match() and replace() -- I think that you know that we need more
>     regular expression support than this; I believe that you're
>     working on that and that I've already commented on it.

Yes.
> 
>   + duration/dateTime functions -- I've already commented on these in
>     a separate thread. I think that this is the poorest section of the
>     spec. 

There is still ongoing work on these.
> 
>   * get-local-name() and get-namespace-uri() -- Makes me wish that
>     the structured data types such as QNames, dates, durations and so
>     on could be treated as virtual elements, so you could do
>     $qname/local-name or $date/year. 

Yes; I'd like to see a consistent approach to component extraction too. I
have proposed naming the functions, for example, QName.local-name(),
DateTime.month(), but I seem to be the only one who likes the idea.
> 
>   + deep-equal() -- I wouldn't personally say that this was a
>     high-priority function.

I agree with you, especially as it seems difficult to come up with a single
definition of deep equality that pleases everyone.
> 
>   - root() -- I think that root($node) does the same thing as
>     $node/ancestor::node()[last()]. Given that the function is
>     possible with very little effort, and that you rarely need to get
>     from a node to the root node of that document, I don't really see
>     the point of this function.

XQuery still has no ancestor axis. This function is actually there because
there is still a debate about the meaning of a leading "/": does it mean
root(), or does it mean input()? The traditional XSLT meaning is root(), the
traditional XQuery meaning is input(). Providing both functions ensures that
both capabilities are available unambiguously while we resolve what "/"
should mean.
> 
>   - if-absent() and if-empty() are shorthands ...

I'm personally inclined to agree with you; I also worry that people will
misunderstand what if-empty() does.
> 
>   - empty() -- empty($seq) seems to be equivalent to
>     not(boolean($seq)); as with other shorthands for easy expressions,
>     I don't think this one's necessary
> 
>   - exists() -- seems to be equivalent to not(empty($seq))

I quite like the improved readability that you get from empty() and
exists().
> 
>   + distinct-nodes() -- This obviously doesn't arise in XSLT 1.0
>     because it's impossible to create a node set that contains more
>     than one of a particular node. Given that node sequences are (or
>     should/can be) created with duplicates automatically removed, I
>     doubt that this will come into play very often...

I agree that this will probably not be used very often, and that writing
$N|() or $N/. might be adequate, even though both are non-obvious.
> 
>   * distinct-values() -- This functionality is required (and
>     lacking) in XSLT 1.0, but the grouping facilities in XSLT 2.0 mean
>     that it wouldn't be nearly as important there.

I agree, but I think it's worth keeping, if only for non-XSLT uses of XPath.
> 
>   - insert() > 
>   - remove() -- Again, I don't see why this is needed

I can't say I've found a use case myself, but the alternative constructs
seem to be rather hard work and perhaps more difficult to optimize.
> 
>   * subsequence() -- I imagine would be useful.
> 
>   + sequence-deep-equal() and sequence-node-equal() -- I'm not sure
>     about sequence-deep-equal(), for the same reason I'm not sure
>     about deep-equal().

I'm not happy about these either, but it's hard to come up with the right
answer.
> 
>   - avg() -- I'm not personally convinced (since the equivalent
>     expression of sum() div count() really isn't difficult).

But is it equivalent, eg. if the argument is an empty sequence? Also, it's
both easier to write and easier to optimize in cases where the argument is a
complex path expression.
> 
>   * max() and min() -- Definitely. This is a requirements that's
>     probably even greater than date formatting or regular expressions.
>     It would be even more helpful if there was a quick way of getting
>     to the node(s) that has the min/max value, rather than just
>     getting the value itself. I imagine we're going to see rather a
>     lot of $nodes[. = max($nodes)] otherwise, although I guess that
>     could be optimised.

I agree.
> 
>   - idref() -- As I've said elsewhere, id() turns out to be hardly
>     used in XSLT because of the issues to do with requiring a DTD be
>     present for the link to be any use. Where you need a reverse link,
>     you can generally set up a key instead. I'd rather see keys from
>     XML Schema supported than a specific idref() function introduced.

I agree.
> 
>   - filter() -- I think this is potentially very useful, but, like
>     copy() and shallow(), it has to do with creating nodes, which
>     means that it shouldn't live at the XPath level.

We're likely to drop this one.
> 
>   - collection() -- I don't really understand how this is different
>     from the document() function.

Think databases. The notion is that a collection is a set of documents that
you might want to search, the entire collection being identified by a single
URI. We don't know exactly how different implementations will model
collections of documents, so this gives us a suitable abstraction. We did
think of overloading document() so a single URI could return multiple
documents, but it makes an already-complex function even harder to
understand.
> 
>   * input() -- Sounds reasonable.
> 
>   - context-item() -- I assume that this is not a real function, but
>     actually just a backup for the shorthand '.'? It should say so.

Actually, there's still a conflict in the published drafts, which at one
point say that "." means "self::node()". We expect to resolve this.
> 
>   * current-dateTime() -- Definitely required; XForms calls this
>     function now()

The bigger the working group, the longer the names it comes up with...
> 
>   * tokenize(), which people ask for all the time, particularly for
>     splitting strings into lines or words

I hope that this will be very easy to do with regular expressions.
> 
>   + possibly sqrt(), sin() and cos(), which are particularly useful
>     when creating graphic formats such as SVG and aren't that easy to
>     implement in XSLT

I think we made a conscious decision to exclude trig functions, they can be
easily provided by a third-party library.
> 
>   * random() (create random numbers) and more usefully, I think,
>     randomize() (randomly alter the order of items in a sequence),
>     both with obvious side-effect issues; again these are impossible
>     to implement using XSLT

Yes, this does come up surprisingly frequently. Needs looking at.
> 
>   * function-available() to support the idea that XPath function
>     libraries could be provided by particular implementations.

XQuery isn't prepared to accept dynamic binding of function names, without
which function-available() is meaningless. So this will remain an XSLT-only
function.
> 
>   * system-property() to support getting information about the XPath
>     implementation version and so on.

Yes, that would be an interesting one to migrate. The current XSLT spec has
the disadvantage that it relies on run-time resolution of namespace
prefixes, which XPath otherwise does not require.

Michael Kay
Received on Monday, 20 May 2002 07:37:48 UTC