- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 08 Jul 2003 15:30:38 -0400
- To: public-qt-comments@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear XML Query WG and XSL WG,
Below please find the second (and final) part of the I18N WGs comments on
your last call document "XQuery 1.0 and XPath 2.0 Functions and Operators"
(http://www.w3.org/TR/2003/WD-xpath-functions-20030502/).
Please note the following:
- Please address all replies to these comments to the I18N IG mailing
list (w3c-i18n-ig@w3.org), not just to me.
- All i18n-relevant comments are marked with ***. There are also general
comments on the spec which we hope you will find useful.
- We have not yet reviewed the other documents, such as XQuery 1.0
or XSLT 2.0, and so we might be unaware of i18n issues that appear
in these specs but may have to be traced back to functions and operators.
There are also cases where we have identified an i18n issue here,
but we are not sure exactly what the best solution will be, and which
document it will have to be addressed in. Also, there are issues that
have been raised in comments to you about a different document but
that apply to this document, too. Sometimes, this is mentioned below,
but not always.
- Our comments are numbered in square brackets [nn]; the numbering
continues from the first part.
- Please note that this mail contains a few additional comments on
sections already commented on in our first part.
- We again apologize for our delay.
We look forward to further discussion with you to find the best
solution on these issues.
[78] 1.7 namespace prefix: op:xxx backs up operators, not directly user
accessible:
shouldn't it be the choice of the language using functions and
operators of whether to expose these operators or not
(XQuery and XSLT have made their choice to the negative, but
there might be other languages)
[79] 2.3 "cast as xs:string": there should be a forward reference to this
notation.
[80] 7.1 (this partly may superseed our issue [33]:
"This document uses the term "code point" as a synonym for "Unicode
scalar value". [The Unicode Standard] sometimes spells this term
"codepoint". Code points range from #x0000 to #x10FFFF inclusive,
except for the range #xD800 to #xDFFF inclusive, which is the range
reserved for surrogate pairs. The use of the word 'character' in this
document is in the sense of production [2] of [XML 1.0 Recommendation
(Second Edition)]."
The relationship between code point and scalar value was fuzzy in the
past. Unicode 4.0 makes it clear, that code point is #x0000 to
#x10FFFF inclusive, and scalar values are the subset #x0000 to #xD7FF
plus #xE000 to #x10FFFF inclusive. XML can't represent all code points
anyway (example: #x0000), so probably best to just use code point. A
minor wording issue is that #xD800 to #xDFFF is the range reserved for
surrogate code points, used for surrogate pairs. So, suggested wording
is:
"This document uses the term "code point" as defined in [The Unicode
Standard], ranging from #x0000 to #x10FFFF inclusive. The use of the
word 'character' in this document is in the sense of production [2] of
[XML 1.0 Recommendation (Second Edition)], so it may include code
points which have not yet been assigned to characters."
The spec should be checked so that it does not use the word codepoint
anymore when surrogate codepoints are excluded.
[81] 7.4.11 normalize-unicode: As of
http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-FullyNormalized,
what is called 'W3C normalized' here has been renamed to
'fully normalized' in the character model.
[82] 7.4.11 normalize-unicode: 'full normalization' needs a defition of the
relevant constructs. For strings, the string itself is most
conveniently the relevant construct, but this shoud be said
explicitly.
[83] 7.4.11 normalize-unicode: Maybe not as a function, but in any case
somehow, normalization checking on input and normalization on
output should be available in both XQuery and XSLT, on full
XML constructs (with the relevant definitions form XML 1.1)
[84] 7.4.13/14: maybe there should be an attribute for Turkish
[85] 7.4.13/14: an example should use German sharp-s
[86] 7.4.14: two paragraphs are the same (1st and 5th)
[87] 7.4.16: flag to escape non-ascii or not (default should be not to
unescape them)
[88] 7.4.12 "Otherwise, returns the value of $srcval after translating
every lower-case letter to its upper-case correspondent. Every lower-case
letter that does not have an upper-case correspondent, and every character
that is not a lower-case letter, is included in the returned value in its
original form. A "lower-case letter" is a character whose Unicode General
Category class includes "Ll". The corresponding upper-case letter is
determined using [Unicode Case Mappings]."
There is a problem here. The set of characters that have uppercases
and the set of characters that are Ll are disjoint. This should read
instead:
"Otherwise, returns the value of $srcval after translating every
character to its upper-case correspondent. Every character that does
not have an upper-case correspondent is included in the returned value
in its original form. The precise mapping is determined using [Unicode
Case Mappings]."
and mutatis mutandis for 7.4.13
[89] 7.4 There should also be an fn:title-case function, because titlecasing
(also called initial caps) is *not* the same as uppercasing the first
letter in a word and titlecasing.
[90] 9.1 *** There should be a note about the inadvisability of all the types
with a 'g' prefix.
[91] 9.2: *** "Note: The W3C XML Query Working Group has requested the W3C
XML Schema Working Group that these two subtypes of xs:duration be included
in the built-in datatypes described in [XML Schema Part 2: Datatypes]."
We support this request. This is very much needed!
[92] 9.2.1: <xs:pattern value="[\-]?P[0-9]+(Y([0-9]+M)?|M)"/>
why not simply <xs:pattern value="-?P[0-9]+(Y([0-9]+M)?|M)"/> ?
this is legal in Perl, is it not legal in XQuery/XPath?
[93] 9.2.2: Is white space allowed in regular expressions?
[94] 9.2.2: "The designator 'T' ?must? be absent if and only if all of the
time items are absent."
This seems to conflict with examples -P35.89S and P4D251M,
which are said to be allowed.
[95] 9.2.2.3: *** Durations cannot not allow leap seconds in their
canonical representation.
[96] 9.3: For many types, there is op:foo-equal, op:foo-less-than, and
op:foo-greater-than. As these are only defined for backing up
operators, it would be much better to define only a single
comparison function (similar to string), or at least only
op:foo-equal and op:foo-less-than. Backup then works easily as follows:
a eq b <==> foo-equal(a,b)
a ne b <==> !foo-equal(a,b)
a lt b <==> foo-less-than(a,b)
a ge b <==> !foo-less-than(a,b)
a gt b <==> foo-less-than(b,a)
a le b <==> !foo-less-than(b,a)
[97] 9.3: *** "If either operand to a comparison function on date or time
values does not have an explicit timezone then, for the purpose of the
operation, an implicit timezone, provided by the evaluation context, is
assumed to be present as part of the value."
This is used here and in many other places, but we think that it
is totally unadequate and dangerous. Reasonable use of these types
should be either using timezoned data only, or data without a timezone
(or actually with a user-managed, separate indication of the timezone)
only. The best way to achieve this would be to separate the relevant
types into with-timezone and without-timezone. Anything else will
cause more confusion than it will help.
At a minimum, there should be very clear warnings everywhere
a 'default timezone' is mentioned.
[98] 9.4: Having separate functions for extraction from dateTime, date, and
time
seems to be unnecessarily tedious. Also, the description of the actual
functions should be shortened.
[99] 9.4.10: ***
"fn:get-hours-from-dateTime(xs:dateTime("1999-12-31T12:00:00")) returns 17"
This strange result is just plain weird, and won't help anybody.
12 is what the user will expect, and what she should get.
[100] 9.4.8 and other time divisions: What is the expected precision?
Some systems may be able to provide very high precision, but should they?
[101] 9.6: *** "For purposes of timezone adjustment, an xs:date is treated
as an xs:dateTime with time 00:00:00."
It is unclear what this means, and it doesn't seem to have been
'implemented' in the actual function definitions.
[102] 9.6: ***There should be a clear explanation of what 'adjustment' means,
namely (in general, at least) to change the time notation so that
it still denotes the same physical time, but uses a different
timezone to do so.
[103] 9.6: ***The treatment of implicit timezones,... leads to very
strange discontinuities for these timezone adjustments. In particular,
while physical time is kept constant for adjustments with time zones,
it is not when a timezone is missing. This is very dangerous.
Also there are differences in behavior between adjustments of dateTime
and date.
[104] 9.6: ***In connection with daylight saving time adjustment, it is often
necessary to shift times by keeping the same nominal value but
changing the time zone, in effect shifting the physical time
with the timezone shift. But there is no operation to do this
easily.
[105] 9.6: *** "op:subtract-yearMonthDuration-from-dateTime"
The order of the operands is the wrong way round in the function
name. This will cause problems for non native English speakers.
There should at least be a warning, but ideally a fix.
[106] 9.7.1: This should say that the duration is always rounded down
to full months. There should be an example with more rounding
(almost a full month).
[107] 9.7.2: *** Examples with part of the operands having implicit timezones
may be important to document the current design, but are very bad
usage examples.
[108] 9.7.13 and similar: "This value is added to the normalized value of
$srcval1 and the result returned."
It seems that it is important to normalize after the calculation.
[109] 9.7.13: The slack available due to time zones is not used.
e.g. it might be possible to say that 23:00:00+09:00 + PT5H
is 23:00:00+04:00 or some such.
[110] 10: *** The general comment about anyURIs and URIs applies here again.
[111] 10.1.1: What about allowing other nodes (e.g. attribute) in second
position
for fn:resolve-QName?
[112] 11.1: fn:resolve-uri: This terminology should be cross-checked with
the new terminology in RFC2396bis.
[113] 11.1: It may be helpful to have fn:resolve-uri(string, node),
i.e. get the base implicitly.
[114] 11.1: "The second form of this function expects $base to be an
absolute URI and $relative to be a relative URI."
The second part of the sentence is misleading, because it can also
be absolute.
[115] 11.1: *** The 'how to compare URIs' reference is outdated. In final
form, it should point to the relevant section of the IRI spec.
[116] 12.1.1/2: This is virtually useless. At least a function to compare
hex with base64 should be available. This would cover the current
two functions and provide more functionality.
[117] 14. It would be good to have the example doc in actual XML, rather
than just described.
[118] 14.1: This subsection seems totally pointless. There may be others
like this.
[119] 14.1.4: ***casting to numeric types: This would case
<a>1<b>2</b>3</a> to 123, yes? There should be functionality
that allows to e.g. ignore/remove the <b> element with content,
or convert each text node,...
[120] 14.1.5: ***fn:lang: There should be a function providing the result of
(ancestor-or-self::*/@xml:lang)[last()]
This is a step towards better support of language tagging, but
we think that other steps will be needed.
Ideally, this function should be called lang, and the current
function should be called lang-match, but that may be against
backwards compatibility.
[121] 14.1.5: *** fn:lang should return true also if $testlang is ""
(i.e. matching for any language)
[122] 14.1.5: *** there are only four examples, not five. There should
be some examples with false results.
[123] 14.1.5: please say explicitly that xml:lang can be taken from an ancestor
[124] 14.1.7/8: again, only one of node-before and node-after is needed
for backup.
[125] 15.1.1/2/3: The names of these functions should express the testing
(rather than constructive) nature of these functions.
[126] 15.1.4: There seems to be some hickup in: "The singleton xs:string
value "". (the zero-length string). The expression cast as xs:boolean
($srcval) returns false if $srcval is "0" and true if $srcval is "1"."
[126] ***15.1.7 and others: It would be a good idea to list all the functions
that potentially take a collatior or are affected by collations
in the collation section.
[127] 15.1.12: Changing this from insert-before to insert-after will at least
bring this function in line with usual indexing practice (i.e. the
position before the first item is 0, after the first is 1, and so on).
[128] 15.1.15: "This function takes a sequence or more typically, an
expression, that evaluates to a sequence, and indicates that the result
sequence may be returned in any order."
This should explicitly say that the same sequence (except for order)
as the argument is returned.
[129] 15.2: 'union', 'intersect', and 'except' are badly alligned grammatically
(a noun, a verb, and a preposition)
[130] 15.2.1 *** is there any collation default for deep-equal?
[131] 15.2.1: *** "If the type of the items in $parameter1 and $parameter2
is not xs:string and $collationLiteral is specified, the collation is
ignored."
what about text nodes?
[132] 15.2.1.1: *** Why are namespace nodes compared with a collation?
namespaces should be compared codepoint-by-codepoint.
[133] 15.2.1.1: "Note: The result of fn:deep-equal(1, current-dateTime())
is false; it does not raise an error."
What does this want to say? That even the weirdest type combination
is not an error?
[134] 15.2.1.1, code segments: These code segments use stuff that is not
defined in this spec, such as 'eq'. There should be a pointer
to a definition of 'eq'. This is one instance of a general problem
already pointed out.
[135] 15.3.3 and others: *** the examples seem to imply that collation is
an optional argument, but the signature shows it as mandatory.
[136] 15.3: rather than the very few aggregation functions provided here,
it seems to be crucial to have adequate and easy-to-use second-
order functions.
[137] 15.3.4/5: *** for strings, collations are used. What about subtypes
of strings? what about anySimple and anyAtomic?
[138] 15.4.2: what 'substitution'?
[139] 15.4.2/3: *** that collations are not used for ID/IDREF is good
[140] 15.4.4: *** the text speaks about URIs, but this should be anyURIs.
[141] 15.4.4: guaranteeing 'doc("foo.xml") is doc("foo.xml")' may lead
to problems for queries or transformations that run for a very
long time (e.g. days,...) [not that they necessarily take that
much time to compute, but that they are e.g. tuned to return
a series of elements or documents at a certain pace.
[142] 15.4.4: "If two calls on this function supply different absolute
URIs, the same document node may be returned if the implementation can
determine that the two URIs refer to the same resource."
A short explanation of how an implementation would do that may help.
[143] 15.4.5: fn:collection: How can a single URI return a collection of
documents?
Is this e.g. the result of a multiple-choice reply? or what?
Again, a short explanation listing a few possibilities may help.
[144] 15.4.6: What is the 'input sequence'?
[145] dates/times in general: The examples should vary the default time
zone, not
always use the same one, so that people get more aware of the
arbitrariness of the calculations.
[146] 16.4.1/5.1: The example should be more realistic, with seconds and
fractional parts.
[147] 16.7: *** Would it not be better to return the codepoint collation
if no default is set?
[148] 17: In the first three lines of this section, five different terms
are used:
"casting function", "cast function", "cast operator", "constructor
function", "cast expression". Please clean up terminology and clearly
explain the terms that you use.
[149] 17: Why are there two syntaxes for casts, one being a substring of
the other?
One syntax should be enough (probably the shorter one)
[150] 17.1: "and "M" indicates that a conversion from values of the type to
which the row applies to the type to which the column applies *may* be
supported, subject to restrictions discussed in this section."
Does the 'may' mean 'implementations *may* support this kind of casting'?
Or 'this cast *will* work for a subset of the values of the source type,
in all implementations'?
Please clarify.
[151] 17.1: abbreviations: We suggest using them only for the columns, but
to use
the rows with the full name and the abbreviation at the same time.
That way, everything is contained in a single printout.
[152] 17.1: *** Any type that starts with a 'g' for 'Gregorian' should keep
this 'g'
in the shortcut.
[153] 17.1: *** Why is there an 'M' for anySimple to untypedAtomic? If this is
because untypedAtomic cannot contain spaces, then the 'M' would also
apply to str->untypedAtomic, because strings obviously can contain spaces.
[154] 17.1: "In the following table, the notation "S\T" indicates that the
source ("S") of the conversion is indicated in the column below the
notation and that the target ("T") is indicated in the row to the right of
the notation."
This sounds utterly helpless. Better change to Source\Target and be
done with it, or use special long-range row and column outside the current
table to indicate source and target. (by separating words into letters,
vertically elongated table cells can easily be filled with text; newer
browsers may even support adequate styling properties for vertical text.
[155] 17.4: Again a monetary type restricted to two digits after the
decimal point.
Here, please add a warning that this won't cover all currencies.
[156] 17.7: *** Casting from string to anyURI: Why is space replaced by %20?
Please note that the newest IRI draft does not allow space, nor the
other characters in ascii but not allowed in URIs.
[157] 17.7: *** "To cast to xs:anySimpleType or xdt:untypedAtomic the value
is cast to xs:string, as described above, and the type annotation changed
to xs:anySimpleType or xdt:untypedAtomic, respectively."
These types are so extremely close that we think actual casts
should not actually be needed (i.e. wherever a string goes, so goes
an anySimple or an untypedAtomic, and vice versa.
[158] 17.7: *** casting from strings should include casing from text nodes.
[159] 17.8: "the xs:float value TV" -> "the xs:float TV"
[160] 17.8: "if SV is 1 or true": The value is only one of these, there just
(unfortunately) happen to be two notations. Same for "0 or false".
[161] 17.9: The semi-formal description in terms of castings to strings and
back is difficult to follow. An informal description in terms of
components, followed maybe by a fully formulatic description of
each conversion in a single formula, would be clearer.
[162] 17.9.5-9: The instructions for both dateTime and date are exactly
the same. Please just say "If ST is dateTime or date, then..."
[163] 17.13: *** Again, please make sure you use anyURI, not URI.
[164] 17.15: Is xs:Notation($notation) allowed, or not? The table seems
to suggest yes, the text seems to suggest no.
[165] References: "The Unicode Standard" should be:
"The Unicode Standard
The Unicode Consortium. The Unicode Standard, Version 4.0
(Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1)
[166] References: [Unicode Case Mappings] should be:
Defined in The Unicode Standard, Section 3.13."
[167] References: You may want to add a reference to the ISO equivalent
of the Unicode Collation Algorithm, ISO 14651. Like in the case
of Unicode and 10646, the UCA is an extension of ISO 14651 --
but a very substantial extension. This should be said in an
explanatory note.
[168] C: It should be possible to have an XSLT implementation use functions
defined with XQuery and vice versa, or that the WGs provide some
at least proof-of-concept quality test software that can do the
conversion. Also, the fact that there need
to be two different ways to define these suggests that some follow-up
work on an extensive function library may be highly adequate.
[169] F.2: This should be called Functions and Operators Index.
Regards, Martin.
Received on Tuesday, 8 July 2003 15:31:07 UTC