W3C home > Mailing lists > Public > public-qt-comments@w3.org > September 2003

RE: Re: I18N last call comments on XQuery/XPath Fun/Op (2nd part)

From: Ashok Malhotra <ashokma@microsoft.com>
Date: Tue, 2 Sep 2003 04:59:16 -0700
Message-ID: <E5B814702B65CB4DA51644580E4853FB0A700A68@red-msg-12.redmond.corp.microsoft.com>
To: <public-qt-comments@w3.org>, <w3c-i18n-ig@w3.org>, "Martin Duerst" <duerst@w3.org>
Great comments!  I have responded to some by clarifying wording and
others by creating issues for the XML Query WG or the F&O taskforce to
On two issues I have started discussion threads.  See comments inline.

This is not a formal response from the Query WG.  Feel free to comment
if you think something has not been addressed adequately.

All the best, Ashok


Message-Id: <>
Date: Tue, 08 Jul 2003 15:30:38 -0400
To: public-qt-comments@w3.org
From: Martin Duerst <duerst@w3.org
8143146.04c3a258@localhost&gt;> >
Cc: w3c-i18n-ig@w3.org
Subject: I18N last call comments on XQuery/XPath Fun/Op (2nd part)

Dear XML Query WG and XSL WG,

Below please find the second (and final) part of the I18N WGs comments
your last call document "XQuery 1.0 and XPath 2.0 Functions and

Please note the following:
- Please address all replies to these comments to the I18N IG mailing
   list (w3c-i18n-ig@w3.org
030708143146.04c3a258@localhost&gt;> ), not just to me.
- All i18n-relevant comments are marked with ***. There are also general
   comments on the spec which we hope you will find useful.
- We have not yet reviewed the other documents, such as XQuery 1.0
   or XSLT 2.0, and so we might be unaware of i18n issues that appear
   in these specs but may have to be traced back to functions and
   There are also cases where we have identified an i18n issue here,
   but we are not sure exactly what the best solution will be, and which
   document it will have to be addressed in. Also, there are issues that
   have been raised in comments to you about a different document but
   that apply to this document, too. Sometimes, this is mentioned below,
   but not always.
- Our comments are numbered in square brackets [nn]; the numbering
   continues from the first part.
- Please note that this mail contains a few additional comments on
   sections already commented on in our first part.
- We again apologize for our delay.

We look forward to further discussion with you to find the best
solution on these issues.

[78] 1.7 namespace prefix: op:xxx backs up operators, not directly user 
     shouldn't it be the choice of the language using functions and
     operators of whether to expose these operators or not
     (XQuery and XSLT have made their choice to the negative, but
      there might be other languages)
[AM] Changed the language.

[79] 2.3 "cast as xs:string": there should be a forward reference to
[AM] Done!

[80] 7.1 (this partly may supersede our issue [33]:
    "This document uses the term "code point" as a synonym for "Unicode
    scalar value". [The Unicode Standard] sometimes spells this term
    "codepoint". Code points range from #x0000 to #x10FFFF inclusive,
    except for the range #xD800 to #xDFFF inclusive, which is the range
    reserved for surrogate pairs. The use of the word 'character' in
    document is in the sense of production [2] of [XML 1.0
    (Second Edition)]."

     The relationship between code point and scalar value was fuzzy in
     past. Unicode 4.0 makes it clear, that code point is #x0000 to
     #x10FFFF inclusive, and scalar values are the subset #x0000 to
     plus #xE000 to #x10FFFF inclusive. XML can't represent all code
     anyway (example: #x0000), so probably best to just use code point.
     minor wording issue is that #xD800 to #xDFFF is the range reserved
     surrogate code points, used for surrogate pairs. So, suggested

     "This document uses the term "code point" as defined in [The
     Standard], ranging from #x0000 to #x10FFFF inclusive. The use of
     word 'character' in this document is in the sense of production [2]
     [XML 1.0 Recommendation (Second Edition)], so it may include code
     points which have not yet been assigned to characters."

     The spec should be checked so that it does not use the word
     anymore when surrogate codepoints are excluded.
[AM] Done!

[81] 7.4.11 normalize-unicode: As of
<http://www.w3.org/TR/2002/WD-charmod-20020430/> ,
     what is called 'W3C normalized' here has been renamed to
     'fully normalized' in the character model.
[AM]  Fixed!  Thanks!

[82] 7.4.11 normalize-unicode: 'full normalization' needs a defition of
     relevant constructs. For strings, the string itself is most
     conveniently the relevant construct, but this should be said
[AM] Requested clarification.

[83] 7.4.11 normalize-unicode: Maybe not as a function, but in any case
     somehow, normalization checking on input and normalization on
     output should be available in both XQuery and XSLT, on full
     XML constructs (with the relevant definitions form XML 1.1)
[AM] Added issue.

[84] 7.4.13/14: maybe there should be an attribute for Turkish
[AM] Too language specific!

[85] 7.4.13/14: an example should use German sharp-s
[AM] Please suggest example.  Would appreciate as many examples as you
can suggest for any part of the spec.

[86] 7.4.14: two paragraphs are the same (1st and 5th)
[AM] Fixed!  Thanks!

[87] 7.4.16: flag to escape non-ascii or not (default should be not to
     unescape them)
[AM] Added to issue.

[88] 7.4.12 "Otherwise, returns the value of $srcval after translating 
every lower-case letter to its upper-case correspondent. Every
letter that does not have an upper-case correspondent, and every
that is not a lower-case letter, is included in the returned value in
original form. A "lower-case letter" is a character whose Unicode
Category class includes "Ll". The corresponding upper-case letter is 
determined using [Unicode Case Mappings]."
     There is a problem here. The set of characters that have uppercases
     and the set of characters that are Ll are disjoint. This should

     "Otherwise, returns the value of $srcval after translating every
     character to its upper-case correspondent. Every character that
     not have an upper-case correspondent is included in the returned
     in its original form. The precise mapping is determined using
     Case Mappings]."

     and mutatis mutandis for 7.4.13
[AM] Done!  Thanks!  Jim Melton confirms.

[89] 7.4 There should also be an fn:title-case function, because
     (also called initial caps) is *not* the same as uppercasing the
     letter in a word and titlecasing.
[AM] This has been suggested and the feeling was that this is not an
important enough usecase.

[90] 9.1 *** There should be a note about the inadvisability of all the
     with a 'g' prefix.
[AM]  How about saying that only equality is supported for these types?

[91] 9.2: *** "Note: The W3C XML Query Working Group has requested the
XML Schema Working Group that these two subtypes of xs:duration be
in the built-in datatypes described in [XML Schema Part 2: Datatypes]."
     We support this request. This is very much needed!
[AM] You are welcome!  The XML Schema WG seems favorable disposed
towards this request.

[92] 9.2.1: <xs:pattern value="[\-]?P[0-9]+(Y([0-9]+M)?|M)"/>
     why not simply <xs:pattern value="-?P[0-9]+(Y([0-9]+M)?|M)"/> ?
     this is legal in Perl, is it not legal in XQuery/XPath?

[93] 9.2.2: Is white space allowed in regular expressions?
[AM]  I believe not!

[94] 9.2.2: "The designator 'T' ?must? be absent if and only if all of
time items are absent."
     This seems to conflict with examples -P35.89S and P4D251M,
     which are said to be allowed.
[AM] The correct forms are -PT35.89S and P4D251M.  Thanks!

[95] *** Durations cannot not allow leap seconds in their
     canonical representation.
[AM] The current canonical form for dayTimeDuration allows leap seconds.
This appears incorrect as duration is an untethered duration of time.
Created issue.

[96] 9.3: For many types, there is op:foo-equal, op:foo-less-than, and
    op:foo-greater-than. As these are only defined for backing up
    operators, it would be much better to define only a single
    comparison function (similar to string), or at least only
    op:foo-equal and op:foo-less-than. Backup then works easily as
    a eq b <==> foo-equal(a,b)
    a ne b <==> !foo-equal(a,b)
    a lt b <==> foo-less-than(a,b)
    a ge b <==> !foo-less-than(a,b)
    a gt b <==> foo-less-than(b,a)
    a le b <==> !foo-less-than(b,a)

[AM] Noted!

[97] 9.3: *** "If either operand to a comparison function on date or
values does not have an explicit timezone then, for the purpose of the 
operation, an implicit timezone, provided by the evaluation context, is 
assumed to be present as part of the value."
     This is used here and in many other places, but we think that it
     is totally inadequate and dangerous. Reasonable use of these types
     should be either using timezoned data only, or data without a
     (or actually with a user-managed, separate indication of the
     only. The best way to achieve this would be to separate the
     types into with-timezone and without-timezone. Anything else will
     cause more confusion than it will help.
     At a minimum, there should be very clear warnings everywhere
     a 'default timezone' is mentioned.
[AM] There is an existing issue covering this.  Also discussing with

[98] 9.4: Having separate functions for extraction from dateTime, date,
    seems to be unnecessarily tedious. Also, the description of the
    functions should be shortened.
[AM] There is an existing issue covering this.  

[99] 9.4.10: *** 
"fn:get-hours-from-dateTime(xs:dateTime("1999-12-31T12:00:00")) returns
     This strange result is just plain weird, and won't help anybody.
     12 is what the user will expect, and what she should get.
[AM] Added explanation.

[100] 9.4.8 and other time divisions: What is the expected precision?
     Some systems may be able to provide very high precision, but should
[AM] See conformance note in 9.1.1.

[101] 9.6: *** "For purposes of timezone adjustment, an xs:date is
as an xs:dateTime with time 00:00:00."
     It is unclear what this means, and it doesn't seem to have been
     'implemented' in the actual function definitions.
[AM] Moved note.  Added text.  Fixed example.

[102] 9.6: ***There should be a clear explanation of what 'adjustment'
     namely (in general, at least) to change the time notation so that
     it still denotes the same physical time, but uses a different
     timezone to do so.
[AM] Added explanation.

[103] 9.6: ***The treatment of implicit timezones,... leads to very
    strange discontinuities for these timezone adjustments. In
    while physical time is kept constant for adjustments with time
    it is not when a timezone is missing. This is very dangerous.
    Also there are differences in behavior between adjustments of
    and date.
[AM] I'm not sure how to respond to this.  Would a warning suffice?  

[104] 9.6: ***In connection with daylight saving time adjustment, it is
     necessary to shift times by keeping the same nominal value but
     changing the time zone, in effect shifting the physical time
     with the timezone shift. But there is no operation to do this
[AM] Daylight savings time varies from state to state and even county to
county.  There is no rational algorithm.  These functions provide a
mechanism for adjusting dates/times using an arbitrarily specified

[105] 9.6: *** "op:subtract-yearMonthDuration-from-dateTime"
    The order of the operands is the wrong way round in the function
    name. This will cause problems for non native English speakers.
    There should at least be a warning, but ideally a fix.
[AM] Created issue.

[106] 9.7.1: This should say that the duration is always rounded down
    to full months. There should be an example with more rounding
    (almost a full month).
[AM] Added example.

[107] 9.7.2: *** Examples with part of the operands having implicit
    may be important to document the current design, but are very bad
    usage examples.
[AM] Noted.

[108] 9.7.13 and similar: "This value is added to the normalized value
$srcval1 and the result returned."
     It seems that it is important to normalize after the calculation.
[AM] The normalized value refers to the first component of the two-part
value.  The result is normalized as well.

[109] 9.7.13: The slack available due to time zones is not used.
     e.g. it might be possible to say that 23:00:00+09:00 + PT5H
     is 23:00:00+04:00 or some such.
[AM] The result needs to be in the standard form i.e. {normalized-value,

[110] 10: *** The general comment about anyURIs and URIs applies here
[AM] Noted.

[111] 10.1.1: What about allowing other nodes (e.g. attribute) in second

     for fn:resolve-QName?
[AM] Added issue.

[112] 11.1: fn:resolve-uri: This terminology should be cross-checked
     the new terminology in RFC2396bis.
[AM] I took a look.  The function uses only base and relative URI.  This
seems fine.  Besides it's only a draft.

[113] 11.1: It may be helpful to have fn:resolve-uri(string, node),
     i.e. get the base implicitly.
[AM] Added issue.

[114] 11.1: "The second form of this function expects $base to be an 
absolute URI and $relative to be a relative URI."
     The second part of the sentence is misleading, because it can also
     be absolute.
[AM] Fixed, thanks!

[115] 11.1: *** The 'how to compare URIs' reference is outdated. In
     form, it should point to the relevant section of the IRI spec.
[AM] Fixed as per Paul Cotton.

[116] 12.1.1/2: This is virtually useless. At least a function to
     hex with base64 should be available. This would cover the current
     two functions and provide more functionality.
[AM] We now allow casting from hexBinary to base64Binary and vice-versa.
This allows values of these two different binary types to be compared.

[117] 14. It would be good to have the example doc in actual XML, rather
     than just described.
[AM] Added.

[118] 14.1: This subsection seems totally pointless. There may be others
     like this.
[AM] You mean the table summarizing the functions?  I rather like it but
it does make the document longer.  I'll ask the WG and remove if people
want it removed.

[119] 14.1.4: ***casting to numeric types: This would case
     <a>1<b>2</b>3</a> to 123, yes? There should be functionality
     that allows to e.g. ignore/remove the <b> element with content,
     or convert each text node,...
[AM] This is legacy function from XPath 1.0.  The functionality you
request is, in my opinion, difficult to define and too much for version

[120] 14.1.5: ***fn:lang: There should be a function providing the
result of
     This is a step towards better support of language tagging, but
     we think that other steps will be needed.
     Ideally, this function should be called lang, and the current
     function should be called lang-match, but that may be against
     backwards compatibility.
[AM] Noted.

[121] 14.1.5: *** fn:lang should return true also if $testlang is ""
     (i.e. matching for any language)
[AM] Added issue.

[122] 14.1.5: *** there are only four examples, not five. There should
     be some examples with false results.
[AM] Added.

[123] 14.1.5: please say explicitly that xml:lang can be taken from an
[AM] Done.

[124] 14.1.7/8: again, only one of node-before and node-after is needed
     for backup.
[AM] Noted.

[125] 15.1.1/2/3: The names of these functions should express the
     (rather than constructive) nature of these functions.
[AM] Yes.

[126] 15.1.4: There seems to be some hiccup in: "The singleton xs:string

value "". (the zero-length string). The expression cast as xs:boolean 
($srcval) returns false if $srcval is "0" and true if $srcval is "1"."
[AM]  Thanks! The wording has been changed.  There is an open issue on

[126] ***15.1.7 and others: It would be a good idea to list all the
    that potentially take a collation or are affected by collations
    in the collation section.
[AM]  It's a good idea but needs some thought.  Michael Kay has made a

[127] 15.1.12: Changing this from insert-before to insert-after will at
    bring this function in line with usual indexing practice (i.e. the
    position before the first item is 0, after the first is 1, and so
[AM]  I think this is minor.  If you wish we will open an issue.

[128] 15.1.15: "This function takes a sequence or more typically, an 
expression, that evaluates to a sequence, and indicates that the result 
sequence may be returned in any order."
    This should explicitly say that the same sequence (except for order)
    as the argument is returned.
[AM] Clarified.

[129] 15.2: 'union', 'intersect', and 'except' are badly aligned
     (a noun, a verb, and a preposition)
[AM] Not linguistically pure, I agree, but common usage.

[130] 15.2.1 *** is there any collation default for deep-equal?
[AM]  Yes, same as for all the other functions that take an optional
collation.  There is ongoing discussion that some functions should take
the Unicode codepoint collation as default.  Ongoing.

[131] 15.2.1: *** "If the type of the items in $parameter1 and
is not xs:string  and $collationLiteral is specified, the collation is 
     what about text nodes?
[AM] The collation is used.  This is specified in

[132] *** Why are namespace nodes compared with a collation?
     namespaces should be compared codepoint-by-codepoint.
[AM] Created issue.

[133] "Note: The result of vendee-equal(1, current-dateTime())

is false; it does not raise an error."
     What does this want to say? That even the weirdest type combination
     is not an error?
[AM] The signature is item()* so it accepts a sequence of any kind of
node or value.  

[134], code segments: These code segments use stuff that is not
     defined in this spec, such as 'eq'. There should be a pointer
     to a definition of 'eq'. This is one instance of a general problem
     already pointed out.
[AM] Noted.

[135] 15.3.3 and others: *** the examples seem to imply that collation
     an optional argument, but the signature shows it as mandatory.
[AM] There are two signatures.  One with and one without a collation.

[136] 15.3: rather than the very few aggregation functions provided
     it seems to be crucial to have adequate and easy-to-use second-
     order functions.
[AM] We decided not to venture into second-order functions for v1.
Several other areas, such as sorting could also use second-order

[137] 15.3.4/5: *** for strings, collations are used. What about
     of strings? what about anySimple and anyAtomic?
[AM] Clarified "string and types derived by restriction from string'.
anySimpleType is converted to untypedAtomic at validation time.
UntypedAtimic values are cast to the type of the other items.  If all
items are untyped atomic they are cast to string.  This is explained in
the second para of fn:max and fn:min.  The casting rules for
untypedAtomic are still under discussion.

[138] 15.4.2: what 'substitution'?
[AM] Made wording clearer.

[139] 15.4.2/3: *** that collations are not used for ID/IDREF is good

[140] 15.4.4: *** the text speaks about URIs, but this should be
[AM] Done.

[141] 15.4.4: guaranteeing 'doc("foo.xml") is doc("foo.xml")' may lead
     to problems for queries or transformations that run for a very
     long time (e.g. days,...) [not that they necessarily take that
     much time to compute, but that they are e.g. tuned to return
     a series of elements or documents at a certain pace.

[142] 15.4.4: "If two calls on this function supply different absolute 
URIs, the same document node may be returned if the implementation can 
determine that the two URIs refer to the same resource."
     A short explanation of how an implementation would do that may

[143] 15.4.5: fn:collection: How can a single URI return a collection of

     Is this e.g. the result of a multiple-choice reply? or what?
     Again, a short explanation listing a few possibilities may help.
[AM] It is not clear we need such a function in the standard.  Created

[144] 15.4.6: What is the 'input sequence'?
[AM] This function has been removed.

[145] dates/times in general: The examples should vary the default time 
zone, not
     always use the same one, so that people get more aware of the
     arbitrariness of the calculations.
[AM] OK.  I'll work on this.

[146] 16.4.1/5.1: The example should be more realistic, with seconds and
     fractional parts.
[AM] Added fractional seconds to time examples.

[147] 16.7: *** Would it not be better to return the codepoint collation
     if no default is set?
[AM] Created issue.

[148] 17: In the first three lines of this section, five different terms

are used:
     "casting function", "cast function", "cast operator", "constructor
     function", "cast expression". Please clean up terminology and
     explain the terms that you use.
[AM] Wording cleaned up.

[149] 17: Why are there two syntaxes for casts, one being a substring of

the other?
     One syntax should be enough (probably the shorter one)
[AM] Good question!  I'll add an issue.

[150] 17.1: "and "M" indicates that a conversion from values of the type
which the row applies to the type to which the column applies *may* be 
supported, subject to restrictions discussed in this section."
     Does the 'may' mean 'implementations *may* support this kind of
     Or 'this cast *will* work for a subset of the values of the source
     in all implementations'?
     Please clarify.
[AM] Clarified to means cast may or may not succeed.

[151] 17.1: abbreviations: We suggest using them only for the columns,
to use
     the rows with the full name and the abbreviation at the same time.
     That way, everything is contained in a single printout.
[AM] I think using different abbreviations for the rows and columns is
confusing.  Also, there have been complaints about the width of the

[152] 17.1: *** Any type that starts with a 'g' for 'Gregorian' should
this 'g'
     in the shortcut.
[AM] Good suggestion!  Done!

[153] 17.1: *** Why is there an 'M' for anySimple to untypedAtomic? If
this is
     because untypedAtomic cannot contain spaces, then the 'M' would
     apply to str->untypedAtomic, because strings obviously can contain
[AM] anySimpleType has been removed from the table.

[154] 17.1: "In the following table, the notation "S\T" indicates that
source ("S") of the conversion is indicated in the column below the 
notation and that the target ("T") is indicated in the row to the right
the notation."
     This sounds utterly helpless. Better change to Source\Target and be
     done with it, or use special long-range row and column outside the
     table to indicate source and target. (by separating words into
     vertically elongated table cells can easily be filled with text;
     browsers may even support adequate styling properties for vertical
[AM] OK.  We'll work on this.

[155] 17.4: Again a monetary type restricted to two digits after the 
decimal point.
     Here, please add a warning that this won't cover all currencies.
[AM] Changed example so as not to cause confusion.

[156] 17.7: *** Casting from string to anyURI: Why is space replaced by
     Please note that the newest IRI draft does not allow space, nor the
     other characters in ascii but not allowed in URIs.
[AM] This rule has been removed.

[157] 17.7: *** "To cast to xs:anySimpleType or xdt:untypedAtomic the
is cast to xs:string, as described above, and the type annotation
to xs:anySimpleType or xdt:untypedAtomic, respectively."
     These types are so extremely close that we think actual casts
     should not actually be needed (i.e. wherever a string goes, so goes
     an anySimple or an untypedAtomic, and vice versa.
[AM] True, but something needs to be said.  Otherwise people will think
it's not possible.

[158] 17.7: *** casting from strings should include casing from text
[AM] Not necessary.  If you pass a text node where a string is expected,
it is atomized and its string-value is extracted.

[159] 17.8: "the xs:float value TV" -> "the xs:float TV"
[AM] Fixed.

[160] 17.8: "if SV is 1 or true": The value is only one of these, there
     (unfortunately) happen to be two notations. Same for "0 or false".
[AM] Right!

[161] 17.9: The semi-formal description in terms of castings to strings
     back is difficult to follow. An informal description in terms of
     components, followed maybe by a fully formulatic description of
     each conversion in a single formula, would be clearer.
[AM] Reorganized text.

[162] 17.9.5-9: The instructions for both dateTime and date are exactly
     the same. Please just say "If ST is dateTime or date, then..."
[AM] Not so, unless I misunderstand your intent.

[163] 17.13: *** Again, please make sure you use anyURI, not URI.
[AM] OK.

[164] 17.15: Is xs:Notation($notation) allowed, or not? The table seems
     to suggest yes, the text seems to suggest no.
[AM] Clarified text.

[165] References: "The Unicode Standard" should be:
     "The Unicode Standard
        The Unicode Consortium. The Unicode Standard, Version 4.0
     (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1)
[AM] Issue created.

[166] References: [Unicode Case Mappings] should be:
     Defined in The Unicode Standard, Section 3.13."
[AM] This is related to the above open issue.

[167] References: You may want to add a reference to the ISO equivalent
     of the Unicode Collation Algorithm, ISO 14651. Like in the case
     of Unicode and 10646, the UCA is an extension of ISO 14651 --
     but a very substantial extension. This should be said in an
     explanatory note.
[AM] We can consider this but it does not seem necessary.

[168] C: It should be possible to have an XSLT implementation use
     defined with XQuery and vice versa, or that the WGs provide some
     at least proof-of-concept quality test software that can do the
     conversion. Also, the fact that there need
     to be two different ways to define these suggests that some
     work on an extensive function library may be highly adequate.
[AM] This is a general comment of the family of specs not just the F&O.

[169] F.2: This should be called Functions and Operators Index.
[AM] Sorry, but I like the name "Quick Reference"!

Regards,    Martin.

All the best, Ashok
Received on Tuesday, 2 September 2003 08:00:01 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:56:49 UTC