- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 08 Jul 2003 15:30:38 -0400
- To: public-qt-comments@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear XML Query WG and XSL WG, Below please find the second (and final) part of the I18N WGs comments on your last call document "XQuery 1.0 and XPath 2.0 Functions and Operators" (http://www.w3.org/TR/2003/WD-xpath-functions-20030502/). Please note the following: - Please address all replies to these comments to the I18N IG mailing list (w3c-i18n-ig@w3.org), not just to me. - All i18n-relevant comments are marked with ***. There are also general comments on the spec which we hope you will find useful. - We have not yet reviewed the other documents, such as XQuery 1.0 or XSLT 2.0, and so we might be unaware of i18n issues that appear in these specs but may have to be traced back to functions and operators. There are also cases where we have identified an i18n issue here, but we are not sure exactly what the best solution will be, and which document it will have to be addressed in. Also, there are issues that have been raised in comments to you about a different document but that apply to this document, too. Sometimes, this is mentioned below, but not always. - Our comments are numbered in square brackets [nn]; the numbering continues from the first part. - Please note that this mail contains a few additional comments on sections already commented on in our first part. - We again apologize for our delay. We look forward to further discussion with you to find the best solution on these issues. [78] 1.7 namespace prefix: op:xxx backs up operators, not directly user accessible: shouldn't it be the choice of the language using functions and operators of whether to expose these operators or not (XQuery and XSLT have made their choice to the negative, but there might be other languages) [79] 2.3 "cast as xs:string": there should be a forward reference to this notation. [80] 7.1 (this partly may superseed our issue [33]: "This document uses the term "code point" as a synonym for "Unicode scalar value". [The Unicode Standard] sometimes spells this term "codepoint". Code points range from #x0000 to #x10FFFF inclusive, except for the range #xD800 to #xDFFF inclusive, which is the range reserved for surrogate pairs. The use of the word 'character' in this document is in the sense of production [2] of [XML 1.0 Recommendation (Second Edition)]." The relationship between code point and scalar value was fuzzy in the past. Unicode 4.0 makes it clear, that code point is #x0000 to #x10FFFF inclusive, and scalar values are the subset #x0000 to #xD7FF plus #xE000 to #x10FFFF inclusive. XML can't represent all code points anyway (example: #x0000), so probably best to just use code point. A minor wording issue is that #xD800 to #xDFFF is the range reserved for surrogate code points, used for surrogate pairs. So, suggested wording is: "This document uses the term "code point" as defined in [The Unicode Standard], ranging from #x0000 to #x10FFFF inclusive. The use of the word 'character' in this document is in the sense of production [2] of [XML 1.0 Recommendation (Second Edition)], so it may include code points which have not yet been assigned to characters." The spec should be checked so that it does not use the word codepoint anymore when surrogate codepoints are excluded. [81] 7.4.11 normalize-unicode: As of http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-FullyNormalized, what is called 'W3C normalized' here has been renamed to 'fully normalized' in the character model. [82] 7.4.11 normalize-unicode: 'full normalization' needs a defition of the relevant constructs. For strings, the string itself is most conveniently the relevant construct, but this shoud be said explicitly. [83] 7.4.11 normalize-unicode: Maybe not as a function, but in any case somehow, normalization checking on input and normalization on output should be available in both XQuery and XSLT, on full XML constructs (with the relevant definitions form XML 1.1) [84] 7.4.13/14: maybe there should be an attribute for Turkish [85] 7.4.13/14: an example should use German sharp-s [86] 7.4.14: two paragraphs are the same (1st and 5th) [87] 7.4.16: flag to escape non-ascii or not (default should be not to unescape them) [88] 7.4.12 "Otherwise, returns the value of $srcval after translating every lower-case letter to its upper-case correspondent. Every lower-case letter that does not have an upper-case correspondent, and every character that is not a lower-case letter, is included in the returned value in its original form. A "lower-case letter" is a character whose Unicode General Category class includes "Ll". The corresponding upper-case letter is determined using [Unicode Case Mappings]." There is a problem here. The set of characters that have uppercases and the set of characters that are Ll are disjoint. This should read instead: "Otherwise, returns the value of $srcval after translating every character to its upper-case correspondent. Every character that does not have an upper-case correspondent is included in the returned value in its original form. The precise mapping is determined using [Unicode Case Mappings]." and mutatis mutandis for 7.4.13 [89] 7.4 There should also be an fn:title-case function, because titlecasing (also called initial caps) is *not* the same as uppercasing the first letter in a word and titlecasing. [90] 9.1 *** There should be a note about the inadvisability of all the types with a 'g' prefix. [91] 9.2: *** "Note: The W3C XML Query Working Group has requested the W3C XML Schema Working Group that these two subtypes of xs:duration be included in the built-in datatypes described in [XML Schema Part 2: Datatypes]." We support this request. This is very much needed! [92] 9.2.1: <xs:pattern value="[\-]?P[0-9]+(Y([0-9]+M)?|M)"/> why not simply <xs:pattern value="-?P[0-9]+(Y([0-9]+M)?|M)"/> ? this is legal in Perl, is it not legal in XQuery/XPath? [93] 9.2.2: Is white space allowed in regular expressions? [94] 9.2.2: "The designator 'T' ?must? be absent if and only if all of the time items are absent." This seems to conflict with examples -P35.89S and P4D251M, which are said to be allowed. [95] 9.2.2.3: *** Durations cannot not allow leap seconds in their canonical representation. [96] 9.3: For many types, there is op:foo-equal, op:foo-less-than, and op:foo-greater-than. As these are only defined for backing up operators, it would be much better to define only a single comparison function (similar to string), or at least only op:foo-equal and op:foo-less-than. Backup then works easily as follows: a eq b <==> foo-equal(a,b) a ne b <==> !foo-equal(a,b) a lt b <==> foo-less-than(a,b) a ge b <==> !foo-less-than(a,b) a gt b <==> foo-less-than(b,a) a le b <==> !foo-less-than(b,a) [97] 9.3: *** "If either operand to a comparison function on date or time values does not have an explicit timezone then, for the purpose of the operation, an implicit timezone, provided by the evaluation context, is assumed to be present as part of the value." This is used here and in many other places, but we think that it is totally unadequate and dangerous. Reasonable use of these types should be either using timezoned data only, or data without a timezone (or actually with a user-managed, separate indication of the timezone) only. The best way to achieve this would be to separate the relevant types into with-timezone and without-timezone. Anything else will cause more confusion than it will help. At a minimum, there should be very clear warnings everywhere a 'default timezone' is mentioned. [98] 9.4: Having separate functions for extraction from dateTime, date, and time seems to be unnecessarily tedious. Also, the description of the actual functions should be shortened. [99] 9.4.10: *** "fn:get-hours-from-dateTime(xs:dateTime("1999-12-31T12:00:00")) returns 17" This strange result is just plain weird, and won't help anybody. 12 is what the user will expect, and what she should get. [100] 9.4.8 and other time divisions: What is the expected precision? Some systems may be able to provide very high precision, but should they? [101] 9.6: *** "For purposes of timezone adjustment, an xs:date is treated as an xs:dateTime with time 00:00:00." It is unclear what this means, and it doesn't seem to have been 'implemented' in the actual function definitions. [102] 9.6: ***There should be a clear explanation of what 'adjustment' means, namely (in general, at least) to change the time notation so that it still denotes the same physical time, but uses a different timezone to do so. [103] 9.6: ***The treatment of implicit timezones,... leads to very strange discontinuities for these timezone adjustments. In particular, while physical time is kept constant for adjustments with time zones, it is not when a timezone is missing. This is very dangerous. Also there are differences in behavior between adjustments of dateTime and date. [104] 9.6: ***In connection with daylight saving time adjustment, it is often necessary to shift times by keeping the same nominal value but changing the time zone, in effect shifting the physical time with the timezone shift. But there is no operation to do this easily. [105] 9.6: *** "op:subtract-yearMonthDuration-from-dateTime" The order of the operands is the wrong way round in the function name. This will cause problems for non native English speakers. There should at least be a warning, but ideally a fix. [106] 9.7.1: This should say that the duration is always rounded down to full months. There should be an example with more rounding (almost a full month). [107] 9.7.2: *** Examples with part of the operands having implicit timezones may be important to document the current design, but are very bad usage examples. [108] 9.7.13 and similar: "This value is added to the normalized value of $srcval1 and the result returned." It seems that it is important to normalize after the calculation. [109] 9.7.13: The slack available due to time zones is not used. e.g. it might be possible to say that 23:00:00+09:00 + PT5H is 23:00:00+04:00 or some such. [110] 10: *** The general comment about anyURIs and URIs applies here again. [111] 10.1.1: What about allowing other nodes (e.g. attribute) in second position for fn:resolve-QName? [112] 11.1: fn:resolve-uri: This terminology should be cross-checked with the new terminology in RFC2396bis. [113] 11.1: It may be helpful to have fn:resolve-uri(string, node), i.e. get the base implicitly. [114] 11.1: "The second form of this function expects $base to be an absolute URI and $relative to be a relative URI." The second part of the sentence is misleading, because it can also be absolute. [115] 11.1: *** The 'how to compare URIs' reference is outdated. In final form, it should point to the relevant section of the IRI spec. [116] 12.1.1/2: This is virtually useless. At least a function to compare hex with base64 should be available. This would cover the current two functions and provide more functionality. [117] 14. It would be good to have the example doc in actual XML, rather than just described. [118] 14.1: This subsection seems totally pointless. There may be others like this. [119] 14.1.4: ***casting to numeric types: This would case <a>1<b>2</b>3</a> to 123, yes? There should be functionality that allows to e.g. ignore/remove the <b> element with content, or convert each text node,... [120] 14.1.5: ***fn:lang: There should be a function providing the result of (ancestor-or-self::*/@xml:lang)[last()] This is a step towards better support of language tagging, but we think that other steps will be needed. Ideally, this function should be called lang, and the current function should be called lang-match, but that may be against backwards compatibility. [121] 14.1.5: *** fn:lang should return true also if $testlang is "" (i.e. matching for any language) [122] 14.1.5: *** there are only four examples, not five. There should be some examples with false results. [123] 14.1.5: please say explicitly that xml:lang can be taken from an ancestor [124] 14.1.7/8: again, only one of node-before and node-after is needed for backup. [125] 15.1.1/2/3: The names of these functions should express the testing (rather than constructive) nature of these functions. [126] 15.1.4: There seems to be some hickup in: "The singleton xs:string value "". (the zero-length string). The expression cast as xs:boolean ($srcval) returns false if $srcval is "0" and true if $srcval is "1"." [126] ***15.1.7 and others: It would be a good idea to list all the functions that potentially take a collatior or are affected by collations in the collation section. [127] 15.1.12: Changing this from insert-before to insert-after will at least bring this function in line with usual indexing practice (i.e. the position before the first item is 0, after the first is 1, and so on). [128] 15.1.15: "This function takes a sequence or more typically, an expression, that evaluates to a sequence, and indicates that the result sequence may be returned in any order." This should explicitly say that the same sequence (except for order) as the argument is returned. [129] 15.2: 'union', 'intersect', and 'except' are badly alligned grammatically (a noun, a verb, and a preposition) [130] 15.2.1 *** is there any collation default for deep-equal? [131] 15.2.1: *** "If the type of the items in $parameter1 and $parameter2 is not xs:string and $collationLiteral is specified, the collation is ignored." what about text nodes? [132] 15.2.1.1: *** Why are namespace nodes compared with a collation? namespaces should be compared codepoint-by-codepoint. [133] 15.2.1.1: "Note: The result of fn:deep-equal(1, current-dateTime()) is false; it does not raise an error." What does this want to say? That even the weirdest type combination is not an error? [134] 15.2.1.1, code segments: These code segments use stuff that is not defined in this spec, such as 'eq'. There should be a pointer to a definition of 'eq'. This is one instance of a general problem already pointed out. [135] 15.3.3 and others: *** the examples seem to imply that collation is an optional argument, but the signature shows it as mandatory. [136] 15.3: rather than the very few aggregation functions provided here, it seems to be crucial to have adequate and easy-to-use second- order functions. [137] 15.3.4/5: *** for strings, collations are used. What about subtypes of strings? what about anySimple and anyAtomic? [138] 15.4.2: what 'substitution'? [139] 15.4.2/3: *** that collations are not used for ID/IDREF is good [140] 15.4.4: *** the text speaks about URIs, but this should be anyURIs. [141] 15.4.4: guaranteeing 'doc("foo.xml") is doc("foo.xml")' may lead to problems for queries or transformations that run for a very long time (e.g. days,...) [not that they necessarily take that much time to compute, but that they are e.g. tuned to return a series of elements or documents at a certain pace. [142] 15.4.4: "If two calls on this function supply different absolute URIs, the same document node may be returned if the implementation can determine that the two URIs refer to the same resource." A short explanation of how an implementation would do that may help. [143] 15.4.5: fn:collection: How can a single URI return a collection of documents? Is this e.g. the result of a multiple-choice reply? or what? Again, a short explanation listing a few possibilities may help. [144] 15.4.6: What is the 'input sequence'? [145] dates/times in general: The examples should vary the default time zone, not always use the same one, so that people get more aware of the arbitrariness of the calculations. [146] 16.4.1/5.1: The example should be more realistic, with seconds and fractional parts. [147] 16.7: *** Would it not be better to return the codepoint collation if no default is set? [148] 17: In the first three lines of this section, five different terms are used: "casting function", "cast function", "cast operator", "constructor function", "cast expression". Please clean up terminology and clearly explain the terms that you use. [149] 17: Why are there two syntaxes for casts, one being a substring of the other? One syntax should be enough (probably the shorter one) [150] 17.1: "and "M" indicates that a conversion from values of the type to which the row applies to the type to which the column applies *may* be supported, subject to restrictions discussed in this section." Does the 'may' mean 'implementations *may* support this kind of casting'? Or 'this cast *will* work for a subset of the values of the source type, in all implementations'? Please clarify. [151] 17.1: abbreviations: We suggest using them only for the columns, but to use the rows with the full name and the abbreviation at the same time. That way, everything is contained in a single printout. [152] 17.1: *** Any type that starts with a 'g' for 'Gregorian' should keep this 'g' in the shortcut. [153] 17.1: *** Why is there an 'M' for anySimple to untypedAtomic? If this is because untypedAtomic cannot contain spaces, then the 'M' would also apply to str->untypedAtomic, because strings obviously can contain spaces. [154] 17.1: "In the following table, the notation "S\T" indicates that the source ("S") of the conversion is indicated in the column below the notation and that the target ("T") is indicated in the row to the right of the notation." This sounds utterly helpless. Better change to Source\Target and be done with it, or use special long-range row and column outside the current table to indicate source and target. (by separating words into letters, vertically elongated table cells can easily be filled with text; newer browsers may even support adequate styling properties for vertical text. [155] 17.4: Again a monetary type restricted to two digits after the decimal point. Here, please add a warning that this won't cover all currencies. [156] 17.7: *** Casting from string to anyURI: Why is space replaced by %20? Please note that the newest IRI draft does not allow space, nor the other characters in ascii but not allowed in URIs. [157] 17.7: *** "To cast to xs:anySimpleType or xdt:untypedAtomic the value is cast to xs:string, as described above, and the type annotation changed to xs:anySimpleType or xdt:untypedAtomic, respectively." These types are so extremely close that we think actual casts should not actually be needed (i.e. wherever a string goes, so goes an anySimple or an untypedAtomic, and vice versa. [158] 17.7: *** casting from strings should include casing from text nodes. [159] 17.8: "the xs:float value TV" -> "the xs:float TV" [160] 17.8: "if SV is 1 or true": The value is only one of these, there just (unfortunately) happen to be two notations. Same for "0 or false". [161] 17.9: The semi-formal description in terms of castings to strings and back is difficult to follow. An informal description in terms of components, followed maybe by a fully formulatic description of each conversion in a single formula, would be clearer. [162] 17.9.5-9: The instructions for both dateTime and date are exactly the same. Please just say "If ST is dateTime or date, then..." [163] 17.13: *** Again, please make sure you use anyURI, not URI. [164] 17.15: Is xs:Notation($notation) allowed, or not? The table seems to suggest yes, the text seems to suggest no. [165] References: "The Unicode Standard" should be: "The Unicode Standard The Unicode Consortium. The Unicode Standard, Version 4.0 (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1) [166] References: [Unicode Case Mappings] should be: Defined in The Unicode Standard, Section 3.13." [167] References: You may want to add a reference to the ISO equivalent of the Unicode Collation Algorithm, ISO 14651. Like in the case of Unicode and 10646, the UCA is an extension of ISO 14651 -- but a very substantial extension. This should be said in an explanatory note. [168] C: It should be possible to have an XSLT implementation use functions defined with XQuery and vice versa, or that the WGs provide some at least proof-of-concept quality test software that can do the conversion. Also, the fact that there need to be two different ways to define these suggests that some follow-up work on an extensive function library may be highly adequate. [169] F.2: This should be called Functions and Operators Index. Regards, Martin.
Received on Tuesday, 8 July 2003 15:31:07 UTC