W3C home > Mailing lists > Public > public-qt-comments@w3.org > May 2002

RE: F&O WD

From: Ashok Malhotra <ashokma@microsoft.com>
Date: Tue, 14 May 2002 08:58:42 -0700
Message-ID: <E5B814702B65CB4DA51644580E4853FB014887EA@red-msg-12.redmond.corp.microsoft.com>
To: "Jeni Tennison" <jeni@jenitennison.com>, "Jonathan Robie" <jonathan.robie@datadirect-technologies.com>
Cc: <public-qt-comments@w3.org>
Jeni:
Thanks for your comments.  We are all at a W3C meeting this week.  Will get back to you next week.
Ashok

	-----Original Message----- 
	From: Jeni Tennison [mailto:jeni@jenitennison.com] 
	Sent: Mon 5/13/2002 4:01 AM 
	To: Jonathan Robie 
	Cc: public-qt-comments@w3.org 
	Subject: F&O WD
	
	

	Hi Jonathan,
	
	I promised some more detailed comments on the F&O WD, so here you are.
	As usual, these come from my perspective as an XSLT user rather than
	anything else. I've ignored the constructors and casting sections,
	since I know they're under review anyway.
	
	I guess my guiding principal is that if a function is just a shorthand
	for something that can be implemented without a recursive function,
	then it shouldn't be included in the core set of XPath 2.0 functions.
	Both XQuery and XSLT have methods of defining extension functions, so
	I think that it's more important to focus on the functions that are
	impossible or difficult to implement in XQuery/XSLT rather than those
	that are simply convenience functions.
	
	Cheers,
	
	Jeni
	
	---
	
	The new functions, added on to XPath 1.0, are the following. I've put
	* by the ones that I think should stay, - by those that I think should
	go, and + by those on which I'm equivocal:
	
	  - node-kind() -- I've hardly ever seen a problem that's required
	    this functionality. I think it would be more flexible to use the
	    "instance of" operator to work out what kind of node you're
	    looking at; it would be easy enough to define your own function to
	    give you the name of the node type on the rare cases that's
	    required. In other words, you should be able to get at the type of
	    a node and the type of an atomic value in the same way.
	 
	  + node-name() -- This used to be the name() function; I wonder
	    whether it would be possible to merge this with the name()
	    function. It would be great if that could be done so that the
	    name() function works in the way that people think it works, such
	    that "name() = 'pfx:name'" is equivalent to "self::pfx:name"; this
	    would be backwards-incompatible with XPath 1.0, but would be more
	    intuitive for users.
	
	  * data() -- Certainly required now, but as with a lot of these
	    functions, I wonder whether it would be helpful to have it follow
	    the pattern of existing functions, like name() and string(), and
	    have it return the typed value of the context node if it doesn't
	    have an argument passed to it. I know that the F&O document
	    purposefully tries to avoid overloaded functions, but for users,
	    both those used to XPath 1.0 and those coming new to XPath 2.0, it
	    will be confusing that different functions work in different ways
	    depending on which version they were introduced in.
	
	  * base-uri() -- Certainly very useful; we often get questions asking
	    how to get the URL of the file that's being used as the source of
	    the transformation.
	
	  - unique-ID() -- I've never known anyone to have to get hold of the
	    value of the ID attribute on a given element. If they do, they
	    know the name of the attribute and can get its value through
	    normal mechanisms. I'm also worried that this function will get
	    confused with the generate-id() function.
	
	  * compare() -- We do need this facility although not as much as
	    you might think, in my opinion. I have to say that personally I
	    find a return value of -1, 0 or 1 difficult to work with: I always
	    get confused about which way round the arguments are related. It
	    would be great if there was an alternative design, but I doubt
	    that there is and since we'll rarely have to use different
	    collations, I don't think that's too much of a problem.
	
	  - normalize-unicode() -- As far as I understand the character
	    model for the WWW, all text on the Internet should be normalized,
	    and specifications should require unicode normalized (NFC) text. I
	    can't recall ever seeing someone need to do unicode normalization;
	    I suspect that such operations would be better done at a lower
	    level in the application (normalize early) and that the data model
	    should dictate that text is normalized.
	
	  * upper-case() and lower-case() -- There's definitely a strong
	    requirement for these, although allowing case-insensitive
	    comparisons (which I think is supported with collations?) will go
	    most of the way towards supporting the usual reason for
	    case-changing. As I think I might have mentioned before, I believe
	    that technically there should be a title-case() function as well,
	    since the title case version of a letter is not always the same as
	    the upper case version of a letter (ref.
	    http://www.unicode.org/unicode/reports/tr21/)
	
	  + string-pad() -- Repeating the same string is a fairly common
	    operation, although it is one that's particularly easy to
	    accomplish now with a user-defined function and a simple
	    iteration. I therefore don't think that this function is vital,
	    and if you want to save space, I think it should be dropped.
	
	  * match() and replace() -- I think that you know that we need more
	    regular expression support than this; I believe that you're
	    working on that and that I've already commented on it.
	
	  + duration/dateTime functions -- I've already commented on these in
	    a separate thread. I think that this is the poorest section of the
	    spec. The kinds of things that people want to do with dates are:
	
	      - reformat them (which I believe is being supported separately
	        in XSLT 2.0, though it's not there yet)
	
	      - get a date from the common "seconds since
	        1970-01-01T00:00:00Z" representation (for all its faults)
	
	      - perform calculations between them
	
	    Dates have a fixed format, so it's not hard to extract individual
	    components from a date; I don't think that the set of functions to
	    do so are necessary. It's harder to extract information from a
	    duration because it doesn't have a fixed format, but not
	    drastically so, and I think it's really very rare that you need to
	    know get that kind of information from a duration.
	
	    One thing that *is* difficult, and is useful, is to get values
	    like "the number of seconds represented by this duration" (i.e.
	    the reverse of dayTimeDuration-from-seconds()) -- it's useful
	    because that enables you to perform calculations with durations
	    (adding them, dividing them) that you can't do otherwise.
	
	  * get-local-name() and get-namespace-uri() -- Makes me wish that
	    the structured data types such as QNames, dates, durations and so
	    on could be treated as virtual elements, so you could do
	    $qname/local-name or $date/year. These are certainly handy
	    functions, though.
	
	  * resolve-URI() -- I imagine this will be very handy.
	
	    URI manipulation is, I think, the primary reason for the
	    requirement for string manipulation functions like
	    subtring-after-last() or index-of-last(). Perhaps a
	    get-file-name() method would be useful; I'm not sure.
	
	  + deep-equal() -- I wouldn't personally say that this was a
	    high-priority function. My guess would be that people would use it
	    for the common task of moving through two documents to see where
	    differences lie between them, and in that context I think it would
	    be very expensive. But others might have use cases that I'm
	    unaware of.
	
	  - root() -- I think that root($node) does the same thing as
	    $node/ancestor::node()[last()]. Given that the function is
	    possible with very little effort, and that you rarely need to get
	    from a node to the root node of that document, I don't really see
	    the point of this function.
	
	  - if-absent() and if-empty() are shorthands for:
	    if (not($node)) then $default else $node and
	    if (not($node) or not($node/node())) then $default else $node
	    I don't find these expressions so burdensome that they require
	    shorthand functions, especially not compared to some of the other
	    functionality that's currently missing from the spec.
	
	  * index-of() -- definitely required, though I have no doubt that
	    people will use it like:
	
	    $nodes[index-of(for $n in $nodes return string(), 'foo')]
	
	  - empty() -- empty($seq) seems to be equivalent to
	    not(boolean($seq)); as with other shorthands for easy expressions,
	    I don't think this one's necessary, although it's true that the
	    casting of empty sequences to boolean false can be non-obvious for
	    beginners.
	
	  - exists() -- seems to be equivalent to not(empty($seq)) or exactly
	    equivalent to boolean($seq). I don't think this is necessary;
	    empty() is more useful if you didn't want to use boolean() in the
	    way that it's been used in XPath 1.0.
	
	  + distinct-nodes() -- This obviously doesn't arise in XSLT 1.0
	    because it's impossible to create a node set that contains more
	    than one of a particular node. Given that node sequences are (or
	    should/can be) created with duplicates automatically removed, I
	    doubt that this will come into play very often; there aren't any
	    use cases for it in the XQuery use case document either. On the
	    other hand, the equivalent expression (distinct-nodes($nodes) is
	    the same as union(() | $nodes)) is a bit of a hack and might not
	    get you precisely what you want (since it also reorders into
	    document order), so it's probably best to be on the safe side.
	
	  * distinct-values() -- This functionality is required (and
	    lacking) in XSLT 1.0, but the grouping facilities in XSLT 2.0 mean
	    that it wouldn't be nearly as important there. I can see places
	    where it would be handy, though (for example to write things like
	    "there are 4 groups...", and to allow me to apply templates to
	    distinct nodes in order to get more flexibility in my stylesheet).
	    Since this function is likely to be much more heavily used than
	    distinct-nodes(), I think it should be shortened to distinct().
	
	  - insert() -- I can't really see the point, given that there's a
	    concat(), a subsequence() and an index-of() and I don't think that
	    there will often be times when you need to insert items into the
	    middle of a sequence.
	
	  - remove() -- Again, I don't see why this is needed, given that you
	    can use a predicate to do the same thing: $target[position() !=
	    $position].
	
	  * subsequence() -- I imagine would be useful.
	
	  + sequence-deep-equal() and sequence-node-equal() -- I'm not sure
	    about sequence-deep-equal(), for the same reason I'm not sure
	    about deep-equal(). The most useful, I would imagine, would be a
	    plain sequence-equal() that compared the two sequences to see if
	    they were the same on an item-by-item basis, with nodes being
	    assessed based on identity, and values being assessed on their
	    value.
	
	  - avg() -- I'm not personally convinced (since the equivalent
	    expression of sum() div count() really isn't difficult).
	
	  * max() and min() -- Definitely. This is a requirements that's
	    probably even greater than date formatting or regular expressions.
	    It would be even more helpful if there was a quick way of getting
	    to the node(s) that has the min/max value, rather than just
	    getting the value itself. I imagine we're going to see rather a
	    lot of $nodes[. = max($nodes)] otherwise, although I guess that
	    could be optimised.
	
	  - idref() -- As I've said elsewhere, id() turns out to be hardly
	    used in XSLT because of the issues to do with requiring a DTD be
	    present for the link to be any use. Where you need a reverse link,
	    you can generally set up a key instead. I'd rather see keys from
	    XML Schema supported than a specific idref() function introduced.
	
	  - filter() -- I think this is potentially very useful, but, like
	    copy() and shallow(), it has to do with creating nodes, which
	    means that it shouldn't live at the XPath level.
	
	  - collection() -- I don't really understand how this is different
	    from the document() function.
	
	  * input() -- Sounds reasonable.
	
	  - context-item() -- I assume that this is not a real function, but
	    actually just a backup for the shorthand '.'? It should say so.
	
	  * current-dateTime() -- Definitely required; XForms calls this
	    function now(), which has the advantage of being short and
	    avoiding the mixed case convention difficulties.
	
	Aside from those mentioned above, functions that are missing are:
	
	  * tokenize(), which people ask for all the time, particularly for
	    splitting strings into lines or words
	
	  + possibly sqrt(), sin() and cos(), which are particularly useful
	    when creating graphic formats such as SVG and aren't that easy to
	    implement in XSLT
	
	  * random() (create random numbers) and more usefully, I think,
	    randomize() (randomly alter the order of items in a sequence),
	    both with obvious side-effect issues; again these are impossible
	    to implement using XSLT
	
	  * function-available() to support the idea that XPath function
	    libraries could be provided by particular implementations.
	
	  * system-property() to support getting information about the XPath
	    implementation version and so on.
	
	FWIW, on the issues front:
	
	  14: (operator-function-signatures) I agree, some of the
	      signatures are confusing; I read the spec as indicating the
	      required types for the functions, such that if you're using
	      XPath the casting to those types is done automatically.
	
	  20: (operator-codepoint-vs-character) I agree that the spec
	      should be clear about whether it's talking about code points or
	      characters, but I think that the character model spec recommends
	      talking about character strings rather than code unit strings
	      (ref. http://www.w3.org/TR/charmod/#sec-Strings)
	
	  21: (operator-function-return-types) In my opinion, the return
	      type of a function should be fixed, and not change based on the
	      actual type passed as the argument of a function.
	
	  37: (semantic-contains) I think that adding linguistic/semantic
	      contains is a huge effort for very little benefit, at least for
	      XPath 2.0. I can see that XQuery might want it, but I wouldn't
	      want XSLT to be burdened, as the primary task of XSLT is
	      transformation rather than querying.
	
	  44: (operator-collation-specification) I think that XPath 2.0 should
	      follow the pattern of XPath/XSLT 1.0 and use qualified names
	      rather than URIs, for consistency and because it makes them
	      easier to use.
	
	  63: (operator-augment-index-of) I find the distinction between
	      performing operations on nodes vs. performing operations on
	      their values fiddly. In the case of index-of(), it strikes me
	      that it wouldn't be difficult to perform index-of-value() if you
	      had support for an index-of() that matched by node identity or
	      simple type value (by creating a sequence of the node values and
	      getting the index of the value you were after).
	
	  66: (operator-docorder-function) Like distinct-nodes(), the
	      requirement (or lack of it) for this function isn't yet apparent
	      because it's not an issue in XSLT 1.0. Personally, I don't think
	      that it will be used that often, but it may be best to be on the
	      safe side as it wouldn't be particularly easy to replicate this
	      functionality without removing duplicate nodes at the same time.
	
	  67: (operator-remove-dupes) Since location paths do remove
	      duplicates, and there thus isn't any backwards incompatibility
	      with XPath 1.0, I don't think there's any reason for count() or
	      sum() to remove duplicates.
	
	  73: (operator-compare-between) I don't think that a
	      compare-between() function is required.
	
	  77: (operator-string-from-char) chars aren't data types in XML
	      Schema -- are they in XPath? If not, then this issue isn't
	      relevant.
	
	  94: (operator-within-window) As with (semantic-contains), I don't
	      think this is a high priority for XPath 2.0.
	
	 108: (operators-always-normalize) I don't think that we should need
	      to worry about unicode normalization within XPath 2.0.
	
	 136: (function-datetime-timezone-conversion) In XML Schema, the
	      timezone isn't part of the value space of a dateTime. Adding a
	      timezone to a dateTime is essentially a formatting function.
	
	 139: (need-fuller-definition-of-error-behavior-and-handling) Yes. We
	      need to be able to test if an item is an error, and then be able
	      to get information about that error, most importantly an error
	      message that describes it and probably some information about
	      the context in which the error occurred (e.g. what the context
	      node was). I'm sure that you already have something on the cards
	      here. Another point of confusion is that the empty sequence is
	      sometimes used as a kind of error value, but at other times an
	      error object is returned. I haven't yet worked out what the
	      underlying heuristic is there, assuming that there is one.
	
	 141: (does-string-equality-use-codepoint-or-default-collation) I
	      think it should use the default collation, like the other string
	      manipulation functions.
	
	 142: (what-should-floor-ceiling-round-return) For compatibility, this
	      should really return a xs:double (I believe). However, I think
	      that returning an xs:integer, with an empty sequence used
	      instead of NaN, would also be reasonable.
	
	 143: (need-tokenize-function) As above, we definitely need a
	      tokenize() function, preferably one that defaults to breaking on
	      whitespace.
	
	 144: (should-concat-accept-sequence-arguments) It would be useful,
	      but highly incompatible. Perhaps a separate concat-sequence()
	      function should be invented. (In XSLT 2.0, you can achieve
	      the same effect with an xsl:value-of and an empty separator
	      attribute, but since XSLT shouldn't be used for general sequence
	      construction (apparently), this isn't ideal.
	
	 150: (should-comparison-that-return-indeterminate-results-be-supported)
	      As I've said before, yes. This is far more important than
	      supporting matching of 'nearby' strings and so on, in my
	      opinion.
	
	 151: (comparison-functions-for-other-date-and-time-types) Yes, there
	      should be comparison functions for other date and time types,
	      although a basic rule about how the comparisons are carried out
	      would be better than listing every possible combination of
	      comparisons.
	
	 152: (parameterized-extraction-functions-for-date-and-times) I view
	      the extraction functions as superfluous, in the face of
	      substring() and the prospect of a format-date() function. If you
	      have them here, then I do think that they should be
	      parameterised.
	
	 154: (second-order-distinct-function) Like the other second-order
	      functions, it would be great, but I don't think it's worth
	      entering that territory at this stage.
	
	 157: (boolean-from-string-legal-literals) Absolutely.
	
	 162: (can-the-node-parameter-to-root-be-omitted) As I mentioned
	      above, I think that having single-argument functions default to
	      using the context item is a very useful tactic, and one that
	      XPath 1.0 users are used to exploiting. It would be good, for
	      consistency, if the new functions supported this shorthand.
	
	 164: (for-complex-types-what-should-data-return) I don't have a
	      strong opinion either way, but it should be consistent with the
	      description of the typed value accessor in the data model. Since
	      the string value is readily accessible in other ways, I think
	      data() should probably not return the string value of the
	      element if it has a complex type with complex content.
	
	 166: (current-dateTime-convenience-functions) On the principal of
	      having as few functions as possible, I don't think these
	      convenience functions are necessary. They are easy to define for
	      people who want them.
	
	 168: (should-id-take-a-list-of-strings) id() definitely should be
	      compatible with id() in XPath 1.0, and therefore accept a list
	      of IDs.
	
	---
	Jeni Tennison
	http://www.jenitennison.com/
	
	
Received on Tuesday, 14 May 2002 11:59:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:14:22 GMT