- From: Ashok Malhotra <ashokma@microsoft.com>
- Date: Tue, 2 Sep 2003 04:59:16 -0700
- To: <public-qt-comments@w3.org>, <w3c-i18n-ig@w3.org>, "Martin Duerst" <duerst@w3.org>
- Message-ID: <E5B814702B65CB4DA51644580E4853FB0A700A68@red-msg-12.redmond.corp.microsoft.com>
Great comments! I have responded to some by clarifying wording and others by creating issues for the XML Query WG or the F&O taskforce to discuss. On two issues I have started discussion threads. See comments inline. This is not a formal response from the Query WG. Feel free to comment if you think something has not been addressed adequately. All the best, Ashok _____________________________________________ Message-Id: <4.2.0.58.J.20030708143146.04c3a258@localhost> Date: Tue, 08 Jul 2003 15:30:38 -0400 To: public-qt-comments@w3.org <mailto:public-qt-comments@w3.org?Subject=Re:%20I18N%20last%20call%20com ments%20on%20XQuery/XPath%20Fun/Op%20(2nd%20part)&In-Reply-To=<4. 2.0.58.J.20030708143146.04c3a258@localhost>&References=<4.2.0. 58.J.20030708143146.04c3a258@localhost>> From: Martin Duerst <duerst@w3.org <mailto:duerst@w3.org?Subject=Re:%20I18N%20last%20call%20comments%20on%2 0XQuery/XPath%20Fun/Op%20(2nd%20part)&In-Reply-To=<4.2.0.58.J.200 30708143146.04c3a258@localhost>&References=<4.2.0.58.J.2003070 8143146.04c3a258@localhost>> > Cc: w3c-i18n-ig@w3.org <mailto:w3c-i18n-ig@w3.org?Subject=Re:%20I18N%20last%20call%20comments%2 0on%20XQuery/XPath%20Fun/Op%20(2nd%20part)&In-Reply-To=<4.2.0.58. J.20030708143146.04c3a258@localhost>&References=<4.2.0.58.J.20 030708143146.04c3a258@localhost>> Subject: I18N last call comments on XQuery/XPath Fun/Op (2nd part) Dear XML Query WG and XSL WG, Below please find the second (and final) part of the I18N WGs comments on your last call document "XQuery 1.0 and XPath 2.0 Functions and Operators" (http://www.w3.org/TR/2003/WD-xpath-functions-20030502/). Please note the following: - Please address all replies to these comments to the I18N IG mailing list (w3c-i18n-ig@w3.org <mailto:w3c-i18n-ig@w3.org?Subject=Re:%20I18N%20last%20call%20comments%2 0on%20XQuery/XPath%20Fun/Op%20(2nd%20part)&In-Reply-To=<4.2.0.58. J.20030708143146.04c3a258@localhost>&References=<4.2.0.58.J.20 030708143146.04c3a258@localhost>> ), not just to me. - All i18n-relevant comments are marked with ***. There are also general comments on the spec which we hope you will find useful. - We have not yet reviewed the other documents, such as XQuery 1.0 or XSLT 2.0, and so we might be unaware of i18n issues that appear in these specs but may have to be traced back to functions and operators. There are also cases where we have identified an i18n issue here, but we are not sure exactly what the best solution will be, and which document it will have to be addressed in. Also, there are issues that have been raised in comments to you about a different document but that apply to this document, too. Sometimes, this is mentioned below, but not always. - Our comments are numbered in square brackets [nn]; the numbering continues from the first part. - Please note that this mail contains a few additional comments on sections already commented on in our first part. - We again apologize for our delay. We look forward to further discussion with you to find the best solution on these issues. [78] 1.7 namespace prefix: op:xxx backs up operators, not directly user accessible: shouldn't it be the choice of the language using functions and operators of whether to expose these operators or not (XQuery and XSLT have made their choice to the negative, but there might be other languages) [AM] Changed the language. [79] 2.3 "cast as xs:string": there should be a forward reference to this notation. [AM] Done! [80] 7.1 (this partly may supersede our issue [33]: "This document uses the term "code point" as a synonym for "Unicode scalar value". [The Unicode Standard] sometimes spells this term "codepoint". Code points range from #x0000 to #x10FFFF inclusive, except for the range #xD800 to #xDFFF inclusive, which is the range reserved for surrogate pairs. The use of the word 'character' in this document is in the sense of production [2] of [XML 1.0 Recommendation (Second Edition)]." The relationship between code point and scalar value was fuzzy in the past. Unicode 4.0 makes it clear, that code point is #x0000 to #x10FFFF inclusive, and scalar values are the subset #x0000 to #xD7FF plus #xE000 to #x10FFFF inclusive. XML can't represent all code points anyway (example: #x0000), so probably best to just use code point. A minor wording issue is that #xD800 to #xDFFF is the range reserved for surrogate code points, used for surrogate pairs. So, suggested wording is: "This document uses the term "code point" as defined in [The Unicode Standard], ranging from #x0000 to #x10FFFF inclusive. The use of the word 'character' in this document is in the sense of production [2] of [XML 1.0 Recommendation (Second Edition)], so it may include code points which have not yet been assigned to characters." The spec should be checked so that it does not use the word codepoint anymore when surrogate codepoints are excluded. [AM] Done! [81] 7.4.11 normalize-unicode: As of http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-FullyNormalized <http://www.w3.org/TR/2002/WD-charmod-20020430/> , what is called 'W3C normalized' here has been renamed to 'fully normalized' in the character model. [AM] Fixed! Thanks! [82] 7.4.11 normalize-unicode: 'full normalization' needs a defition of the relevant constructs. For strings, the string itself is most conveniently the relevant construct, but this should be said explicitly. [AM] Requested clarification. [83] 7.4.11 normalize-unicode: Maybe not as a function, but in any case somehow, normalization checking on input and normalization on output should be available in both XQuery and XSLT, on full XML constructs (with the relevant definitions form XML 1.1) [AM] Added issue. [84] 7.4.13/14: maybe there should be an attribute for Turkish [AM] Too language specific! [85] 7.4.13/14: an example should use German sharp-s [AM] Please suggest example. Would appreciate as many examples as you can suggest for any part of the spec. [86] 7.4.14: two paragraphs are the same (1st and 5th) [AM] Fixed! Thanks! [87] 7.4.16: flag to escape non-ascii or not (default should be not to unescape them) [AM] Added to issue. [88] 7.4.12 "Otherwise, returns the value of $srcval after translating every lower-case letter to its upper-case correspondent. Every lower-case letter that does not have an upper-case correspondent, and every character that is not a lower-case letter, is included in the returned value in its original form. A "lower-case letter" is a character whose Unicode General Category class includes "Ll". The corresponding upper-case letter is determined using [Unicode Case Mappings]." There is a problem here. The set of characters that have uppercases and the set of characters that are Ll are disjoint. This should read instead: "Otherwise, returns the value of $srcval after translating every character to its upper-case correspondent. Every character that does not have an upper-case correspondent is included in the returned value in its original form. The precise mapping is determined using [Unicode Case Mappings]." and mutatis mutandis for 7.4.13 [AM] Done! Thanks! Jim Melton confirms. [89] 7.4 There should also be an fn:title-case function, because titlecasing (also called initial caps) is *not* the same as uppercasing the first letter in a word and titlecasing. [AM] This has been suggested and the feeling was that this is not an important enough usecase. [90] 9.1 *** There should be a note about the inadvisability of all the types with a 'g' prefix. [AM] How about saying that only equality is supported for these types? [91] 9.2: *** "Note: The W3C XML Query Working Group has requested the W3C XML Schema Working Group that these two subtypes of xs:duration be included in the built-in datatypes described in [XML Schema Part 2: Datatypes]." We support this request. This is very much needed! [AM] You are welcome! The XML Schema WG seems favorable disposed towards this request. [92] 9.2.1: <xs:pattern value="[\-]?P[0-9]+(Y([0-9]+M)?|M)"/> why not simply <xs:pattern value="-?P[0-9]+(Y([0-9]+M)?|M)"/> ? this is legal in Perl, is it not legal in XQuery/XPath? [93] 9.2.2: Is white space allowed in regular expressions? [AM] I believe not! [94] 9.2.2: "The designator 'T' ?must? be absent if and only if all of the time items are absent." This seems to conflict with examples -P35.89S and P4D251M, which are said to be allowed. [AM] The correct forms are -PT35.89S and P4D251M. Thanks! [95] 9.2.2.3: *** Durations cannot not allow leap seconds in their canonical representation. [AM] The current canonical form for dayTimeDuration allows leap seconds. This appears incorrect as duration is an untethered duration of time. Created issue. [96] 9.3: For many types, there is op:foo-equal, op:foo-less-than, and op:foo-greater-than. As these are only defined for backing up operators, it would be much better to define only a single comparison function (similar to string), or at least only op:foo-equal and op:foo-less-than. Backup then works easily as follows: a eq b <==> foo-equal(a,b) a ne b <==> !foo-equal(a,b) a lt b <==> foo-less-than(a,b) a ge b <==> !foo-less-than(a,b) a gt b <==> foo-less-than(b,a) a le b <==> !foo-less-than(b,a) [AM] Noted! [97] 9.3: *** "If either operand to a comparison function on date or time values does not have an explicit timezone then, for the purpose of the operation, an implicit timezone, provided by the evaluation context, is assumed to be present as part of the value." This is used here and in many other places, but we think that it is totally inadequate and dangerous. Reasonable use of these types should be either using timezoned data only, or data without a timezone (or actually with a user-managed, separate indication of the timezone) only. The best way to achieve this would be to separate the relevant types into with-timezone and without-timezone. Anything else will cause more confusion than it will help. At a minimum, there should be very clear warnings everywhere a 'default timezone' is mentioned. [AM] There is an existing issue covering this. Also discussing with Schema. [98] 9.4: Having separate functions for extraction from dateTime, date, and time seems to be unnecessarily tedious. Also, the description of the actual functions should be shortened. [AM] There is an existing issue covering this. [99] 9.4.10: *** "fn:get-hours-from-dateTime(xs:dateTime("1999-12-31T12:00:00")) returns 17" This strange result is just plain weird, and won't help anybody. 12 is what the user will expect, and what she should get. [AM] Added explanation. [100] 9.4.8 and other time divisions: What is the expected precision? Some systems may be able to provide very high precision, but should they? [AM] See conformance note in 9.1.1. [101] 9.6: *** "For purposes of timezone adjustment, an xs:date is treated as an xs:dateTime with time 00:00:00." It is unclear what this means, and it doesn't seem to have been 'implemented' in the actual function definitions. [AM] Moved note. Added text. Fixed example. [102] 9.6: ***There should be a clear explanation of what 'adjustment' means, namely (in general, at least) to change the time notation so that it still denotes the same physical time, but uses a different timezone to do so. [AM] Added explanation. [103] 9.6: ***The treatment of implicit timezones,... leads to very strange discontinuities for these timezone adjustments. In particular, while physical time is kept constant for adjustments with time zones, it is not when a timezone is missing. This is very dangerous. Also there are differences in behavior between adjustments of dateTime and date. [AM] I'm not sure how to respond to this. Would a warning suffice? [104] 9.6: ***In connection with daylight saving time adjustment, it is often necessary to shift times by keeping the same nominal value but changing the time zone, in effect shifting the physical time with the timezone shift. But there is no operation to do this easily. [AM] Daylight savings time varies from state to state and even county to county. There is no rational algorithm. These functions provide a mechanism for adjusting dates/times using an arbitrarily specified duration. [105] 9.6: *** "op:subtract-yearMonthDuration-from-dateTime" The order of the operands is the wrong way round in the function name. This will cause problems for non native English speakers. There should at least be a warning, but ideally a fix. [AM] Created issue. [106] 9.7.1: This should say that the duration is always rounded down to full months. There should be an example with more rounding (almost a full month). [AM] Added example. [107] 9.7.2: *** Examples with part of the operands having implicit timezones may be important to document the current design, but are very bad usage examples. [AM] Noted. [108] 9.7.13 and similar: "This value is added to the normalized value of $srcval1 and the result returned." It seems that it is important to normalize after the calculation. [AM] The normalized value refers to the first component of the two-part value. The result is normalized as well. [109] 9.7.13: The slack available due to time zones is not used. e.g. it might be possible to say that 23:00:00+09:00 + PT5H is 23:00:00+04:00 or some such. [AM] The result needs to be in the standard form i.e. {normalized-value, timezone} [110] 10: *** The general comment about anyURIs and URIs applies here again. [AM] Noted. [111] 10.1.1: What about allowing other nodes (e.g. attribute) in second position for fn:resolve-QName? [AM] Added issue. [112] 11.1: fn:resolve-uri: This terminology should be cross-checked with the new terminology in RFC2396bis. [AM] I took a look. The function uses only base and relative URI. This seems fine. Besides it's only a draft. [113] 11.1: It may be helpful to have fn:resolve-uri(string, node), i.e. get the base implicitly. [AM] Added issue. [114] 11.1: "The second form of this function expects $base to be an absolute URI and $relative to be a relative URI." The second part of the sentence is misleading, because it can also be absolute. [AM] Fixed, thanks! [115] 11.1: *** The 'how to compare URIs' reference is outdated. In final form, it should point to the relevant section of the IRI spec. [AM] Fixed as per Paul Cotton. [116] 12.1.1/2: This is virtually useless. At least a function to compare hex with base64 should be available. This would cover the current two functions and provide more functionality. [AM] We now allow casting from hexBinary to base64Binary and vice-versa. This allows values of these two different binary types to be compared. [117] 14. It would be good to have the example doc in actual XML, rather than just described. [AM] Added. [118] 14.1: This subsection seems totally pointless. There may be others like this. [AM] You mean the table summarizing the functions? I rather like it but it does make the document longer. I'll ask the WG and remove if people want it removed. [119] 14.1.4: ***casting to numeric types: This would case <a>1<b>2</b>3</a> to 123, yes? There should be functionality that allows to e.g. ignore/remove the <b> element with content, or convert each text node,... [AM] This is legacy function from XPath 1.0. The functionality you request is, in my opinion, difficult to define and too much for version 1. [120] 14.1.5: ***fn:lang: There should be a function providing the result of (ancestor-or-self::*/@xml:lang)[last()] This is a step towards better support of language tagging, but we think that other steps will be needed. Ideally, this function should be called lang, and the current function should be called lang-match, but that may be against backwards compatibility. [AM] Noted. [121] 14.1.5: *** fn:lang should return true also if $testlang is "" (i.e. matching for any language) [AM] Added issue. [122] 14.1.5: *** there are only four examples, not five. There should be some examples with false results. [AM] Added. [123] 14.1.5: please say explicitly that xml:lang can be taken from an ancestor [AM] Done. [124] 14.1.7/8: again, only one of node-before and node-after is needed for backup. [AM] Noted. [125] 15.1.1/2/3: The names of these functions should express the testing (rather than constructive) nature of these functions. [AM] Yes. [126] 15.1.4: There seems to be some hiccup in: "The singleton xs:string value "". (the zero-length string). The expression cast as xs:boolean ($srcval) returns false if $srcval is "0" and true if $srcval is "1"." [AM] Thanks! The wording has been changed. There is an open issue on semantics. [126] ***15.1.7 and others: It would be a good idea to list all the functions that potentially take a collation or are affected by collations in the collation section. [AM] It's a good idea but needs some thought. Michael Kay has made a suggestion. [127] 15.1.12: Changing this from insert-before to insert-after will at least bring this function in line with usual indexing practice (i.e. the position before the first item is 0, after the first is 1, and so on). [AM] I think this is minor. If you wish we will open an issue. [128] 15.1.15: "This function takes a sequence or more typically, an expression, that evaluates to a sequence, and indicates that the result sequence may be returned in any order." This should explicitly say that the same sequence (except for order) as the argument is returned. [AM] Clarified. [129] 15.2: 'union', 'intersect', and 'except' are badly aligned grammatically (a noun, a verb, and a preposition) [AM] Not linguistically pure, I agree, but common usage. [130] 15.2.1 *** is there any collation default for deep-equal? [AM] Yes, same as for all the other functions that take an optional collation. There is ongoing discussion that some functions should take the Unicode codepoint collation as default. Ongoing. [131] 15.2.1: *** "If the type of the items in $parameter1 and $parameter2 is not xs:string and $collationLiteral is specified, the collation is ignored." what about text nodes? [AM] The collation is used. This is specified in 15.2.1.1 [132] 15.2.1.1: *** Why are namespace nodes compared with a collation? namespaces should be compared codepoint-by-codepoint. [AM] Created issue. [133] 15.2.1.1: "Note: The result of vendee-equal(1, current-dateTime()) is false; it does not raise an error." What does this want to say? That even the weirdest type combination is not an error? [AM] The signature is item()* so it accepts a sequence of any kind of node or value. [134] 15.2.1.1, code segments: These code segments use stuff that is not defined in this spec, such as 'eq'. There should be a pointer to a definition of 'eq'. This is one instance of a general problem already pointed out. [AM] Noted. [135] 15.3.3 and others: *** the examples seem to imply that collation is an optional argument, but the signature shows it as mandatory. [AM] There are two signatures. One with and one without a collation. [136] 15.3: rather than the very few aggregation functions provided here, it seems to be crucial to have adequate and easy-to-use second- order functions. [AM] We decided not to venture into second-order functions for v1. Several other areas, such as sorting could also use second-order functions. [137] 15.3.4/5: *** for strings, collations are used. What about subtypes of strings? what about anySimple and anyAtomic? [AM] Clarified "string and types derived by restriction from string'. anySimpleType is converted to untypedAtomic at validation time. UntypedAtimic values are cast to the type of the other items. If all items are untyped atomic they are cast to string. This is explained in the second para of fn:max and fn:min. The casting rules for untypedAtomic are still under discussion. [138] 15.4.2: what 'substitution'? [AM] Made wording clearer. [139] 15.4.2/3: *** that collations are not used for ID/IDREF is good [140] 15.4.4: *** the text speaks about URIs, but this should be anyURIs. [AM] Done. [141] 15.4.4: guaranteeing 'doc("foo.xml") is doc("foo.xml")' may lead to problems for queries or transformations that run for a very long time (e.g. days,...) [not that they necessarily take that much time to compute, but that they are e.g. tuned to return a series of elements or documents at a certain pace. [142] 15.4.4: "If two calls on this function supply different absolute URIs, the same document node may be returned if the implementation can determine that the two URIs refer to the same resource." A short explanation of how an implementation would do that may help. [143] 15.4.5: fn:collection: How can a single URI return a collection of documents? Is this e.g. the result of a multiple-choice reply? or what? Again, a short explanation listing a few possibilities may help. [AM] It is not clear we need such a function in the standard. Created issue. [144] 15.4.6: What is the 'input sequence'? [AM] This function has been removed. [145] dates/times in general: The examples should vary the default time zone, not always use the same one, so that people get more aware of the arbitrariness of the calculations. [AM] OK. I'll work on this. [146] 16.4.1/5.1: The example should be more realistic, with seconds and fractional parts. [AM] Added fractional seconds to time examples. [147] 16.7: *** Would it not be better to return the codepoint collation if no default is set? [AM] Created issue. [148] 17: In the first three lines of this section, five different terms are used: "casting function", "cast function", "cast operator", "constructor function", "cast expression". Please clean up terminology and clearly explain the terms that you use. [AM] Wording cleaned up. [149] 17: Why are there two syntaxes for casts, one being a substring of the other? One syntax should be enough (probably the shorter one) [AM] Good question! I'll add an issue. [150] 17.1: "and "M" indicates that a conversion from values of the type to which the row applies to the type to which the column applies *may* be supported, subject to restrictions discussed in this section." Does the 'may' mean 'implementations *may* support this kind of casting'? Or 'this cast *will* work for a subset of the values of the source type, in all implementations'? Please clarify. [AM] Clarified to means cast may or may not succeed. [151] 17.1: abbreviations: We suggest using them only for the columns, but to use the rows with the full name and the abbreviation at the same time. That way, everything is contained in a single printout. [AM] I think using different abbreviations for the rows and columns is confusing. Also, there have been complaints about the width of the table. [152] 17.1: *** Any type that starts with a 'g' for 'Gregorian' should keep this 'g' in the shortcut. [AM] Good suggestion! Done! [153] 17.1: *** Why is there an 'M' for anySimple to untypedAtomic? If this is because untypedAtomic cannot contain spaces, then the 'M' would also apply to str->untypedAtomic, because strings obviously can contain spaces. [AM] anySimpleType has been removed from the table. [154] 17.1: "In the following table, the notation "S\T" indicates that the source ("S") of the conversion is indicated in the column below the notation and that the target ("T") is indicated in the row to the right of the notation." This sounds utterly helpless. Better change to Source\Target and be done with it, or use special long-range row and column outside the current table to indicate source and target. (by separating words into letters, vertically elongated table cells can easily be filled with text; newer browsers may even support adequate styling properties for vertical text. [AM] OK. We'll work on this. [155] 17.4: Again a monetary type restricted to two digits after the decimal point. Here, please add a warning that this won't cover all currencies. [AM] Changed example so as not to cause confusion. [156] 17.7: *** Casting from string to anyURI: Why is space replaced by %20? Please note that the newest IRI draft does not allow space, nor the other characters in ascii but not allowed in URIs. [AM] This rule has been removed. [157] 17.7: *** "To cast to xs:anySimpleType or xdt:untypedAtomic the value is cast to xs:string, as described above, and the type annotation changed to xs:anySimpleType or xdt:untypedAtomic, respectively." These types are so extremely close that we think actual casts should not actually be needed (i.e. wherever a string goes, so goes an anySimple or an untypedAtomic, and vice versa. [AM] True, but something needs to be said. Otherwise people will think it's not possible. [158] 17.7: *** casting from strings should include casing from text nodes. [AM] Not necessary. If you pass a text node where a string is expected, it is atomized and its string-value is extracted. [159] 17.8: "the xs:float value TV" -> "the xs:float TV" [AM] Fixed. [160] 17.8: "if SV is 1 or true": The value is only one of these, there just (unfortunately) happen to be two notations. Same for "0 or false". [AM] Right! [161] 17.9: The semi-formal description in terms of castings to strings and back is difficult to follow. An informal description in terms of components, followed maybe by a fully formulatic description of each conversion in a single formula, would be clearer. [AM] Reorganized text. [162] 17.9.5-9: The instructions for both dateTime and date are exactly the same. Please just say "If ST is dateTime or date, then..." [AM] Not so, unless I misunderstand your intent. [163] 17.13: *** Again, please make sure you use anyURI, not URI. [AM] OK. [164] 17.15: Is xs:Notation($notation) allowed, or not? The table seems to suggest yes, the text seems to suggest no. [AM] Clarified text. [165] References: "The Unicode Standard" should be: "The Unicode Standard The Unicode Consortium. The Unicode Standard, Version 4.0 (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1) [AM] Issue created. [166] References: [Unicode Case Mappings] should be: Defined in The Unicode Standard, Section 3.13." [AM] This is related to the above open issue. [167] References: You may want to add a reference to the ISO equivalent of the Unicode Collation Algorithm, ISO 14651. Like in the case of Unicode and 10646, the UCA is an extension of ISO 14651 -- but a very substantial extension. This should be said in an explanatory note. [AM] We can consider this but it does not seem necessary. [168] C: It should be possible to have an XSLT implementation use functions defined with XQuery and vice versa, or that the WGs provide some at least proof-of-concept quality test software that can do the conversion. Also, the fact that there need to be two different ways to define these suggests that some follow-up work on an extensive function library may be highly adequate. [AM] This is a general comment of the family of specs not just the F&O. [169] F.2: This should be called Functions and Operators Index. [AM] Sorry, but I like the name "Quick Reference"! Regards, Martin. All the best, Ashok
Received on Tuesday, 2 September 2003 08:00:01 UTC