- From: Ashok Malhotra <ashokma@microsoft.com>
- Date: Tue, 14 May 2002 08:58:42 -0700
- To: "Jeni Tennison" <jeni@jenitennison.com>, "Jonathan Robie" <jonathan.robie@datadirect-technologies.com>
- Cc: <public-qt-comments@w3.org>
Jeni: Thanks for your comments. We are all at a W3C meeting this week. Will get back to you next week. Ashok -----Original Message----- From: Jeni Tennison [mailto:jeni@jenitennison.com] Sent: Mon 5/13/2002 4:01 AM To: Jonathan Robie Cc: public-qt-comments@w3.org Subject: F&O WD Hi Jonathan, I promised some more detailed comments on the F&O WD, so here you are. As usual, these come from my perspective as an XSLT user rather than anything else. I've ignored the constructors and casting sections, since I know they're under review anyway. I guess my guiding principal is that if a function is just a shorthand for something that can be implemented without a recursive function, then it shouldn't be included in the core set of XPath 2.0 functions. Both XQuery and XSLT have methods of defining extension functions, so I think that it's more important to focus on the functions that are impossible or difficult to implement in XQuery/XSLT rather than those that are simply convenience functions. Cheers, Jeni --- The new functions, added on to XPath 1.0, are the following. I've put * by the ones that I think should stay, - by those that I think should go, and + by those on which I'm equivocal: - node-kind() -- I've hardly ever seen a problem that's required this functionality. I think it would be more flexible to use the "instance of" operator to work out what kind of node you're looking at; it would be easy enough to define your own function to give you the name of the node type on the rare cases that's required. In other words, you should be able to get at the type of a node and the type of an atomic value in the same way. + node-name() -- This used to be the name() function; I wonder whether it would be possible to merge this with the name() function. It would be great if that could be done so that the name() function works in the way that people think it works, such that "name() = 'pfx:name'" is equivalent to "self::pfx:name"; this would be backwards-incompatible with XPath 1.0, but would be more intuitive for users. * data() -- Certainly required now, but as with a lot of these functions, I wonder whether it would be helpful to have it follow the pattern of existing functions, like name() and string(), and have it return the typed value of the context node if it doesn't have an argument passed to it. I know that the F&O document purposefully tries to avoid overloaded functions, but for users, both those used to XPath 1.0 and those coming new to XPath 2.0, it will be confusing that different functions work in different ways depending on which version they were introduced in. * base-uri() -- Certainly very useful; we often get questions asking how to get the URL of the file that's being used as the source of the transformation. - unique-ID() -- I've never known anyone to have to get hold of the value of the ID attribute on a given element. If they do, they know the name of the attribute and can get its value through normal mechanisms. I'm also worried that this function will get confused with the generate-id() function. * compare() -- We do need this facility although not as much as you might think, in my opinion. I have to say that personally I find a return value of -1, 0 or 1 difficult to work with: I always get confused about which way round the arguments are related. It would be great if there was an alternative design, but I doubt that there is and since we'll rarely have to use different collations, I don't think that's too much of a problem. - normalize-unicode() -- As far as I understand the character model for the WWW, all text on the Internet should be normalized, and specifications should require unicode normalized (NFC) text. I can't recall ever seeing someone need to do unicode normalization; I suspect that such operations would be better done at a lower level in the application (normalize early) and that the data model should dictate that text is normalized. * upper-case() and lower-case() -- There's definitely a strong requirement for these, although allowing case-insensitive comparisons (which I think is supported with collations?) will go most of the way towards supporting the usual reason for case-changing. As I think I might have mentioned before, I believe that technically there should be a title-case() function as well, since the title case version of a letter is not always the same as the upper case version of a letter (ref. http://www.unicode.org/unicode/reports/tr21/) + string-pad() -- Repeating the same string is a fairly common operation, although it is one that's particularly easy to accomplish now with a user-defined function and a simple iteration. I therefore don't think that this function is vital, and if you want to save space, I think it should be dropped. * match() and replace() -- I think that you know that we need more regular expression support than this; I believe that you're working on that and that I've already commented on it. + duration/dateTime functions -- I've already commented on these in a separate thread. I think that this is the poorest section of the spec. The kinds of things that people want to do with dates are: - reformat them (which I believe is being supported separately in XSLT 2.0, though it's not there yet) - get a date from the common "seconds since 1970-01-01T00:00:00Z" representation (for all its faults) - perform calculations between them Dates have a fixed format, so it's not hard to extract individual components from a date; I don't think that the set of functions to do so are necessary. It's harder to extract information from a duration because it doesn't have a fixed format, but not drastically so, and I think it's really very rare that you need to know get that kind of information from a duration. One thing that *is* difficult, and is useful, is to get values like "the number of seconds represented by this duration" (i.e. the reverse of dayTimeDuration-from-seconds()) -- it's useful because that enables you to perform calculations with durations (adding them, dividing them) that you can't do otherwise. * get-local-name() and get-namespace-uri() -- Makes me wish that the structured data types such as QNames, dates, durations and so on could be treated as virtual elements, so you could do $qname/local-name or $date/year. These are certainly handy functions, though. * resolve-URI() -- I imagine this will be very handy. URI manipulation is, I think, the primary reason for the requirement for string manipulation functions like subtring-after-last() or index-of-last(). Perhaps a get-file-name() method would be useful; I'm not sure. + deep-equal() -- I wouldn't personally say that this was a high-priority function. My guess would be that people would use it for the common task of moving through two documents to see where differences lie between them, and in that context I think it would be very expensive. But others might have use cases that I'm unaware of. - root() -- I think that root($node) does the same thing as $node/ancestor::node()[last()]. Given that the function is possible with very little effort, and that you rarely need to get from a node to the root node of that document, I don't really see the point of this function. - if-absent() and if-empty() are shorthands for: if (not($node)) then $default else $node and if (not($node) or not($node/node())) then $default else $node I don't find these expressions so burdensome that they require shorthand functions, especially not compared to some of the other functionality that's currently missing from the spec. * index-of() -- definitely required, though I have no doubt that people will use it like: $nodes[index-of(for $n in $nodes return string(), 'foo')] - empty() -- empty($seq) seems to be equivalent to not(boolean($seq)); as with other shorthands for easy expressions, I don't think this one's necessary, although it's true that the casting of empty sequences to boolean false can be non-obvious for beginners. - exists() -- seems to be equivalent to not(empty($seq)) or exactly equivalent to boolean($seq). I don't think this is necessary; empty() is more useful if you didn't want to use boolean() in the way that it's been used in XPath 1.0. + distinct-nodes() -- This obviously doesn't arise in XSLT 1.0 because it's impossible to create a node set that contains more than one of a particular node. Given that node sequences are (or should/can be) created with duplicates automatically removed, I doubt that this will come into play very often; there aren't any use cases for it in the XQuery use case document either. On the other hand, the equivalent expression (distinct-nodes($nodes) is the same as union(() | $nodes)) is a bit of a hack and might not get you precisely what you want (since it also reorders into document order), so it's probably best to be on the safe side. * distinct-values() -- This functionality is required (and lacking) in XSLT 1.0, but the grouping facilities in XSLT 2.0 mean that it wouldn't be nearly as important there. I can see places where it would be handy, though (for example to write things like "there are 4 groups...", and to allow me to apply templates to distinct nodes in order to get more flexibility in my stylesheet). Since this function is likely to be much more heavily used than distinct-nodes(), I think it should be shortened to distinct(). - insert() -- I can't really see the point, given that there's a concat(), a subsequence() and an index-of() and I don't think that there will often be times when you need to insert items into the middle of a sequence. - remove() -- Again, I don't see why this is needed, given that you can use a predicate to do the same thing: $target[position() != $position]. * subsequence() -- I imagine would be useful. + sequence-deep-equal() and sequence-node-equal() -- I'm not sure about sequence-deep-equal(), for the same reason I'm not sure about deep-equal(). The most useful, I would imagine, would be a plain sequence-equal() that compared the two sequences to see if they were the same on an item-by-item basis, with nodes being assessed based on identity, and values being assessed on their value. - avg() -- I'm not personally convinced (since the equivalent expression of sum() div count() really isn't difficult). * max() and min() -- Definitely. This is a requirements that's probably even greater than date formatting or regular expressions. It would be even more helpful if there was a quick way of getting to the node(s) that has the min/max value, rather than just getting the value itself. I imagine we're going to see rather a lot of $nodes[. = max($nodes)] otherwise, although I guess that could be optimised. - idref() -- As I've said elsewhere, id() turns out to be hardly used in XSLT because of the issues to do with requiring a DTD be present for the link to be any use. Where you need a reverse link, you can generally set up a key instead. I'd rather see keys from XML Schema supported than a specific idref() function introduced. - filter() -- I think this is potentially very useful, but, like copy() and shallow(), it has to do with creating nodes, which means that it shouldn't live at the XPath level. - collection() -- I don't really understand how this is different from the document() function. * input() -- Sounds reasonable. - context-item() -- I assume that this is not a real function, but actually just a backup for the shorthand '.'? It should say so. * current-dateTime() -- Definitely required; XForms calls this function now(), which has the advantage of being short and avoiding the mixed case convention difficulties. Aside from those mentioned above, functions that are missing are: * tokenize(), which people ask for all the time, particularly for splitting strings into lines or words + possibly sqrt(), sin() and cos(), which are particularly useful when creating graphic formats such as SVG and aren't that easy to implement in XSLT * random() (create random numbers) and more usefully, I think, randomize() (randomly alter the order of items in a sequence), both with obvious side-effect issues; again these are impossible to implement using XSLT * function-available() to support the idea that XPath function libraries could be provided by particular implementations. * system-property() to support getting information about the XPath implementation version and so on. FWIW, on the issues front: 14: (operator-function-signatures) I agree, some of the signatures are confusing; I read the spec as indicating the required types for the functions, such that if you're using XPath the casting to those types is done automatically. 20: (operator-codepoint-vs-character) I agree that the spec should be clear about whether it's talking about code points or characters, but I think that the character model spec recommends talking about character strings rather than code unit strings (ref. http://www.w3.org/TR/charmod/#sec-Strings) 21: (operator-function-return-types) In my opinion, the return type of a function should be fixed, and not change based on the actual type passed as the argument of a function. 37: (semantic-contains) I think that adding linguistic/semantic contains is a huge effort for very little benefit, at least for XPath 2.0. I can see that XQuery might want it, but I wouldn't want XSLT to be burdened, as the primary task of XSLT is transformation rather than querying. 44: (operator-collation-specification) I think that XPath 2.0 should follow the pattern of XPath/XSLT 1.0 and use qualified names rather than URIs, for consistency and because it makes them easier to use. 63: (operator-augment-index-of) I find the distinction between performing operations on nodes vs. performing operations on their values fiddly. In the case of index-of(), it strikes me that it wouldn't be difficult to perform index-of-value() if you had support for an index-of() that matched by node identity or simple type value (by creating a sequence of the node values and getting the index of the value you were after). 66: (operator-docorder-function) Like distinct-nodes(), the requirement (or lack of it) for this function isn't yet apparent because it's not an issue in XSLT 1.0. Personally, I don't think that it will be used that often, but it may be best to be on the safe side as it wouldn't be particularly easy to replicate this functionality without removing duplicate nodes at the same time. 67: (operator-remove-dupes) Since location paths do remove duplicates, and there thus isn't any backwards incompatibility with XPath 1.0, I don't think there's any reason for count() or sum() to remove duplicates. 73: (operator-compare-between) I don't think that a compare-between() function is required. 77: (operator-string-from-char) chars aren't data types in XML Schema -- are they in XPath? If not, then this issue isn't relevant. 94: (operator-within-window) As with (semantic-contains), I don't think this is a high priority for XPath 2.0. 108: (operators-always-normalize) I don't think that we should need to worry about unicode normalization within XPath 2.0. 136: (function-datetime-timezone-conversion) In XML Schema, the timezone isn't part of the value space of a dateTime. Adding a timezone to a dateTime is essentially a formatting function. 139: (need-fuller-definition-of-error-behavior-and-handling) Yes. We need to be able to test if an item is an error, and then be able to get information about that error, most importantly an error message that describes it and probably some information about the context in which the error occurred (e.g. what the context node was). I'm sure that you already have something on the cards here. Another point of confusion is that the empty sequence is sometimes used as a kind of error value, but at other times an error object is returned. I haven't yet worked out what the underlying heuristic is there, assuming that there is one. 141: (does-string-equality-use-codepoint-or-default-collation) I think it should use the default collation, like the other string manipulation functions. 142: (what-should-floor-ceiling-round-return) For compatibility, this should really return a xs:double (I believe). However, I think that returning an xs:integer, with an empty sequence used instead of NaN, would also be reasonable. 143: (need-tokenize-function) As above, we definitely need a tokenize() function, preferably one that defaults to breaking on whitespace. 144: (should-concat-accept-sequence-arguments) It would be useful, but highly incompatible. Perhaps a separate concat-sequence() function should be invented. (In XSLT 2.0, you can achieve the same effect with an xsl:value-of and an empty separator attribute, but since XSLT shouldn't be used for general sequence construction (apparently), this isn't ideal. 150: (should-comparison-that-return-indeterminate-results-be-supported) As I've said before, yes. This is far more important than supporting matching of 'nearby' strings and so on, in my opinion. 151: (comparison-functions-for-other-date-and-time-types) Yes, there should be comparison functions for other date and time types, although a basic rule about how the comparisons are carried out would be better than listing every possible combination of comparisons. 152: (parameterized-extraction-functions-for-date-and-times) I view the extraction functions as superfluous, in the face of substring() and the prospect of a format-date() function. If you have them here, then I do think that they should be parameterised. 154: (second-order-distinct-function) Like the other second-order functions, it would be great, but I don't think it's worth entering that territory at this stage. 157: (boolean-from-string-legal-literals) Absolutely. 162: (can-the-node-parameter-to-root-be-omitted) As I mentioned above, I think that having single-argument functions default to using the context item is a very useful tactic, and one that XPath 1.0 users are used to exploiting. It would be good, for consistency, if the new functions supported this shorthand. 164: (for-complex-types-what-should-data-return) I don't have a strong opinion either way, but it should be consistent with the description of the typed value accessor in the data model. Since the string value is readily accessible in other ways, I think data() should probably not return the string value of the element if it has a complex type with complex content. 166: (current-dateTime-convenience-functions) On the principal of having as few functions as possible, I don't think these convenience functions are necessary. They are easy to define for people who want them. 168: (should-id-take-a-list-of-strings) id() definitely should be compatible with id() in XPath 1.0, and therefore accept a list of IDs. --- Jeni Tennison http://www.jenitennison.com/
Received on Tuesday, 14 May 2002 11:59:15 UTC