- From: Kay, Michael <Michael.Kay@softwareag.com>
- Date: Thu, 9 Oct 2003 01:21:35 +0200
- To: Ashok Malhotra <ashokma@microsoft.com>, "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, public-qt-comments@w3.org, "Kay, Michael" <Michael.Kay@softwareag.com>
- Cc: W3C XML Schema IG <w3c-xml-schema-ig@w3.org>
- Message-ID: <DFF2AC9E3583D511A21F0008C7E62106073DD19D@daemsg02.software-ag.de>
Just a clarification about the "#" character. The list of "reserved" characters in RFC 2396 describes characters that have a special role in a URI. "#" does not have a special role in a URI, but it does have a special role in a URI-reference. Since we are dealing with URI-references rather than URIs, it is appropriate to add "#" to the list. Michael Kay > -----Original Message----- > From: Ashok Malhotra [mailto:ashokma@microsoft.com] > Sent: 07 October 2003 23:14 > To: C. M. Sperberg-McQueen; public-qt-comments@w3.org; Kay, Michael > Cc: W3C XML Schema IG > Subject: RE: XML Schema WG comments on Functions and Operators > > > > This is a response to your comment [2.8] below on > fn:escape-uri. I'm copying the I18N WG because this response > also addresses some of material in their comments [67], [68] > and [69] in > http://lists.w3.org/Archives/Public/public-qt-comments/2003Jul > /0105.html. > > Essentially, your comment said "use the algorithm in the > Linking Spec ...". But, as I argue below, the algorithm in > the F&O is closer to RFC 2396 than the algorithm in the > Linking Spec. There is one exception to this which is the > situation with the # character, of which more later. > > First, let us discuss the behaviour where escape-reserved = > 'true'. I believe this is the algorithm discussed in the > Linking Spec. The Linking spec says "the disallowed > characters include all non-ASCII characters, plus the > excluded characters listed in Section 2.4 of [IETF RFC 2396], > except for the number sign (#) and percent sign (%) and the > square bracket characters re-allowed in [IETF RFC 2732]. " > > However RFC 2396 says > " Data characters that are allowed in a URI but do not have > a reserved > purpose are called unreserved. These include upper and lower case > letters, decimal digits, and a limited set of punctuation marks and > symbols. > > unreserved = alphanum | mark > > mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | > "(" | ")" > > Unreserved characters can be escaped without changing the semantics > of the URI, but this should not be done unless the URI is > being used > in a context that does not allow the unescaped character > to appear." > > Thus, our reading of the above is that all characters except > the above should be escaped and, in particular, the marks > should not be escaped. > > A little later RFC 2396 says > > " Because the percent "%" character always has the reserved purpose of > being the escape indicator, it must be escaped as "%25" in order to > be used as data within a URI." > > Our reading of this rule is that the % must be escaped unless > it is the start of an escape sequence %HH. > > This reading of 2396 was the basis of the rule in the F&O which says > > "If $escape-reserved is true, all characters are escaped > other than lower case letters a-z, upper case letters A-Z, > digits 0-9 and the characters referred to in [RFC 2396] as > "marks": specifically, HYPHEN-MINUS ("-"), LOW LINE ("_"), > FULL STOP ".", EXCLAMATION MARK "!", TILDE "~", ASTERISK "*", > APOSTROPHE "'", LEFT PARENTHESIS "(", and RIGHT PARENTHESIS > ")". The PERCENT SIGN "%" character itself is escaped only if > it is not followed by two hexadecimal digits (that is, 0-9, > a-f and A-F)." > > RFC 2396 says the set of characters included as "reserved" > can occur in specific contexts only and must be escaped if > they are not used in these contexts. We interpret this to > mean that if they are used correctly, reserved characters may > not be escaped. This is the motivation behind the algorithm > for escape-reserved='false'. The rule in the F&O says > > "If $escape-reserved is false, the behavior differs in that > characters referred to in [RFC 2396] and [RFC 2732] as > reserved characters, together with the NUMBER SIGN '#' > character, are not escaped. These characters are SEMICOLON > ";", SOLIDUS "/", QUESTION MARK "?", COLON ":", COMMERCIAL AT > "@", AMPERSAND "&", EQUALS SIGN "=", PLUS SIGN "+", DOLLAR > SIGN "$", COMMA "," NUMBER SIGN "#", LEFT SQUARE BRACKET "[" > and RIGHT SQUARE BRACKET "]"." > > The set of reserved characters in the above is correct > according to RFC 2396 amended by RFC 2732 except for the > inclusion of the # character. > > Thus, if we have read the background material correctly there > are two possible actions. Please advise. > > POSSIBLE ACTIONS: > > 1. Do nothing except remove the # sign from the set of > reserved characters in the rule for escape-reserved='false'. > 2. Change the wording to conform to the Linking Spec even > though it is at variance with RFC 2396. > > All the best, Ashok > > > -----Original Message----- > > From: public-qt-comments-request@w3.org [mailto:public-qt-comments- > > request@w3.org] On Behalf Of C. M. Sperberg-McQueen > > Sent: Friday, August 01, 2003 7:55 PM > > To: public-qt-comments@w3.org > > Cc: W3C XML Schema IG > > Subject: XML Schema WG comments on Functions and Operators > > > > > > Dear colleagues: > > > > The XML Schema Working Group congratulates the XML Query and XSL > > Working Groups on their progress, and in particular on the > Last Call > > draft of "XQuery 1.0 and XPath 2.0 Functions and Operators". > > > > We have not been able to review the last call draft in as > much detail > > as we would have liked, but for what they are worth our > comments are > > at > http://www.w3.org/XML/Group/2003/07/xmlschema-fo-comments.html (an > > ASCII version is reproduced below for the convenience of those with > > access to their email but not to the Web). > > > > We apologize for the tardy arrival of these notes. > > > > -C. M. Sperberg-McQueen, for the W3C XML Schema WG > > > > ................................................................ > > > > > > [1]W3C [2]Architecture Domain [3]XML | [4]XML Schema | [5]Member > > Events | [6]Member-Confidential! > > > > W3C XML Schema WG > > > > Notes on XQuery 1.0 and XPath 2.0 Functions and Operators > > > > 1 August 2003 > > > > _________________________________________________________________ > > > > * 1. [7]Schema-related issues > > + 1.1. [8]Alignment of date/time values > > + 1.2. [9]The type anyAtomicType > > + 1.3. [10]The type untypedAtomic > > + 1.4. [11]Alignment on strings and URIs > > + 1.5. [12]Whitespace handling and lexical forms > > + 1.6. [13]Negative zero > > + 1.7. [14]Totally ordered Booleans > > * 2. [15]Other technical issues > > + 2.1. [16]The fn:base-uri property > > + 2.2. [17]Alignment of references > > + 2.3. [18]Characters and collation units > > + 2.4. [19]Surrogate pairs and Unicode scalar values > > + 2.5. [20]Definition of whitespace > > + 2.6. [21]Required normalization functionality > > + 2.7. [22]Case folding > > + 2.8. [23]Escaping URIs > > + 2.9. [24]The binary types > > + 2.10. [25]Minor items > > o 2.10.1. [26]User control of collations > > o 2.10.2. [27]Section 7.3.1.1 Examples > > * 3. [28]Editorial notes > > > > _________________________________________________________________ > > > > This document contains comments on the Last Call draft > of 2 May 2003 > > of XQuery 1.0 and XPath 2.0 Functions and Operators > transmitted to the > > XML Query and XSL Working Groups on behalf of the XML > Schema Working > > Group. These draft comments have not been reviewed by > the XML Schema > > Working Group and do not necessarily command consensus > within the > > group; because we will not meet again until 28 August, > the Working > > Group directed at its meeting today that these notes should be > > transmitted to the XML Query and XSL Working Groups > without awaiting > > review. > > In addition to the comments below, please note that > several of the > > [29]general comments sent on 14 July relate to the functions and > > operators and data model specifications. Some of those > comments sent > > earlier overlap with some comments below. > > > > 1. Schema-related issues > > > > The comments in this section relate to the use of XML > Schema in the > > F/O specification and thus to the particular area of > responsibility > > borne by the XML Schema WG. > > > > 1.1. Alignment of date/time values > > > > The provision for preserving timezone information in > the values of > > xs:dateTime, xs:date, and xs:time continues to concern > us. We believe > > that a discrepancy of this kind between F/O and XML > Schema will hurt > > users and impede uptake of both specifications. > > > > We believe F/O and XML Schema need to align on this, > either by F/O > > changing to the XML Schema value space, or by changing > the value space > > as part of XML Schema 1.1, or by some other mutually agreed upon > > solution. > > > > 1.2. The type anyAtomicType > > > > We reiterate our concern over the introduction of > anyAtomicType into > > the type hierarchy. We believe that a discrepancy of > this kind between > > F/O and XML Schema will hurt users and impede uptake of both > > specifications. > > > > We believe F/O and XML Schema need to align on this, > either by F/O > > aligning with XML Schema 1.0 or by XML Schema 1.1 aligning with > > F/O. > > > > 1.3. The type untypedAtomic > > > > We reiterate our concern over the introduction of > untypedAtomic into > > the type hierarchy. As with the other discrepancies, we believe > > alignment of the QT specs and XML Schema is critically > important. > > Section 1.3.2 says xdt:untypedAtomic is used wherever > the PSVI has > > xs:anySimpleType; please note that in the PSVI, this > will be the > > case > > > > * when the element or attribute in question was > declared as having > > type anySimpleType > > * when the attribute in question had no declaration > and the schema > > processor assumed the simple urtype for it in the > course of lax > > validation or error recovery > > > > Note that elements will not be assigned the > anySimpleType as their > > type property in the course of lax validation or error > recovery; they > > will have xs:anyType instead. Your use of xdt:untypedAtomic for > > xs:anySimpleType but not for elements which (a) lack > child elements > > and (b) are assigned to xs:anyType may lead to results > which puzzle > > some of your users; we believe you may wish to consider > changing your > > mapping rules to assign xsd:untypedAtomic to such elements. > > > > 1.4. Alignment on strings and URIs > > > > The table at the beginning of section 2, Accessors, > shows functions > > which are intended (judging by their names) to return > URIs and which > > return values of type xs:string instead of xs:anyURI. Similarly, > > various functions which accept URIs as arguments are > given signatures > > using xs:string as the type, which in turn necessitates > ad hoc rules > > of the form "If $collationLiteral is not in the lexical space of > > xs:anyURI, an error is raised". > > > > As you know from our inquiry to you in mid-July, it has > been suggested > > that in XML Schema 1.1 the xs:anyURI type be made a > restriction of > > xs:string. But for now, there appears to be a > discrepancy between the > > use of strings to represent URIs here and the provision > of a distinct > > (and, for typing purposes, disjoint) type in XML Schema 1.0. > > We need to align on this. > > > > 1.5. Whitespace handling and lexical forms > > > > In section 5.1, paragraph 4 reads in part: "If the argument to a > > constructor function is a string literal, the literal > must be a valid > > lexical form for its type ... Whitespace normalization > is applied > > before validation ..." > > > > In all the cases which immediately come to mind, if the > argument is a > > valid lexical form for a type, there is no need to perform any > > whitespace normalization on it. In XML Schema, it is > the result of > > whitespace normalization, not the input to it, which > must be a legal > > lexical form; we believe readers will be less confused > if your usage > > of the terms and ours is consistent. > > > > A possible rewording: "If the argument to a constructor > function is a > > string literal, then whitespace normalization is > applied as indicated > > by the whitespace facet for the datatype. The > whitespace-normalized > > string must be a valid lexical form for the type, as specified > > ..." > > > > 1.6. Negative zero > > > > In section 6, a note explains that the value space of > xs:float and > > xs:double has been extended vis-à-vis that given by XML > Schema, to > > include a negative zero. The note also explains that > the negative zero > > will "never be obtained from the typed value of a node." > > > > We believe this discrepancy is untenable, and we are > not clear why it > > has proven necessary to introduce it. > > > > As far as we can tell by examining the specification, the spec > > mentions different treatment for positive and negative > zero only for > > the functions described in section 6.4 (fn:floor, fn:ceiling, > > fn:round, and fn:round-half-to-even): in the > description of each of > > these functions it is noted that if a zero is given to > the function as > > an argument, the sign of the zero returned as the value of the > > function is the same as the sign of the zero passed in > as an argument. > > (The discussion of fn:ceiling mentions other cases when > negative zero > > is returned; the discussion of fn:floor passes over the > analogous > > cases in silence.) Other mentions of the signed zeroes in this > > specification invariably specify either that something > is true both > > for positive and for negative zero or else that a > constructor may > > return either a positive or a negative zero. > > > > Could you explain the motive for introducing this > discrepancy with the > > value space defined in XML Schema? Would it not suffice > to observe > > that IEEE 754 has both positive and negative zeroes, > which are treated > > as different machine representations of the same values in the > > xs:float and xs:double value spaces, and (optionally) > that the prose > > occasionally mentions these distinct representations of > zero in the > > interests of alignment with IEEE 754, even though > formally they are > > the same value? > > > > Is it essential to introduce an incompatibility with > XML Schema here > > instead of treating positive and negative zeroes as one > value with two > > machine representations? > > > > 1.7. Totally ordered Booleans > > > > We do not believe that it makes sense to impose a user-visible > > ordering on the Boolean data type. Can you explain the > rationale? > > This is a discrepancy between F/O and XML Schema which must, we > > believe, be aligned. > > > > 2. Other technical issues > > > > The comments in this section relate to technical issues > other than the > > use of XML Schema in the F/O specification; the XML > Schema WG claims > > no particular responsibility or expertise on these questions but > > raises them because they seem to need attention. > > > > 2.1. The fn:base-uri property > > > > In section 2.5, the first paragraph defines a base-uri > property for > > all node types: "Document, element and > processing-instruction nodes > > have a base-uri property.... The base-uri of all other > node types is > > the empty sequence." > > > > The next paragraph begins by explaining what happens > "If the accessor > > is called on a node that does not have a base-uri > property ..." If all > > nodes have the property, how can such a node exist? > > > > 2.2. Alignment of references > > > > XML Schema and the Functions and Operators spec should > refer to the > > same version of Unicode. At the moment, this appears not to be > > true. > > > > 2.3. Characters and collation units > > > > The discussion of collation units in the second note of > section 7.3 > > says that collation decomposes a string "into a > sequence of units, > > each unit consisting of one or more characters", and > that various > > comparison operations are performed on these units. The > functions > > fn:starts-with, fn:ends-with, fn:substring-before, and > > fn:substring-after are all mentioned as operating on > such a segmented > > string. > > > > The list of functions at the beginning of section 7.4, however, > > describes them as operating on characters, not on the nameless > > collation units consisting of one or more characters > each. This looks > > like a contradiction. > > > > We believe that the general level of confusion is best > minimized, and > > the world becomes a better place, if in XML-related > specifications the > > word character is used always and only for the units of > the Universal > > Character Set defined by Unicode and by ISO 10646. The > word should not > > be used (however great the temptation becomes at times) > to denote the > > culturally specific units of writing systems (e.g. > letters, symbols, > > signs, graphemes, or what have you). > > > > We suggest recasting the descriptions in 7.4 to > describe the effect of > > the functions in terms of the collation units, rather > than in terms of > > characters. In order to avoid repeating the phrase "the > nameless units > > of one or more characters into which a collation > segments a string for > > purposes of comparison", you may wish to define the term letter, > > grapheme, collation unit, or thingy with that meaning. > > > > 2.4. Surrogate pairs and Unicode scalar values > > > > Section 7.4.6 (like some others) has a note calling > attention to the > > fact that some implementations will represent > characters with code > > points higher than xFFFF by using surrogate pairs. You > quite correctly > > avoid using the term code point for the things which make up the > > surrogate pair, since in section 7.1 you have defined > code point as > > excluding surrogates. But the term 16-bit values is not > defined, as > > far as we can tell. > > > > Also, in Unicode 2 and 3 there are (as far as we have > been able to > > tell) no rules that forbid a double encoding of > characters outside the > > Basic Multilingual Plane (i.e. first representing them > within the BMP > > as surrogate pairs, and then encoding the sequence of > BMP items in > > UTF-8). Even if it is discouraged (and it is indeed outlawed in > > Unicode 4.0), surrogate pairs might well show up not > only in UTF-16 > > but also in UTF-8, where they will presumably be presented by > > Unicode-oblivious character libraries not as pairs of > 16-bit values > > but as four-octet sequences whose intepretation in > terms of Unicode > > scalar values requires slightly special rules. > > > > Note that the definition of code points given in > section 7.1 agrees > > with the definition of Unicode scalar values in Unicode 4.0 in > > excluding the surrogate range, but not with Unicode 2.0 > (the version > > cited in your normative references), or Unicode 3, > which define a > > Unicode scalar value as "a number N from 0 to > 10FFF[16]", without > > leaving any gap for the surrogates. > > > > 2.5. Definition of whitespace > > > > Section 7.4.10 defines the function fn:normalize-space as doing > > various things to whitespace, but it does not define the term > > whitespace. It should, since various definitions are possible. > > The Unicode character database, for example, lists the following > > Unicode characters as whitespace in the file PropList-3_1_0.txt: > > > > * 0009..000D ; White_space # Cc [5] <control>..<control> > > * 0020 ; White_space # Zs SPACE > > * 0085 ; White_space # Cc <control> > > * 00A0 ; White_space # Zs NO-BREAK SPACE > > * 1680 ; White_space # Zs OGHAM SPACE MARK > > * 2000..200A ; White_space # Zs [11] EN QUAD..HAIR SPACE > > * 2028 ; White_space # Zl LINE SEPARATOR > > * 2029 ; White_space # Zp PARAGRAPH SEPARATOR > > * 202F ; White_space # Zs NARROW NO-BREAK SPACE > > * 3000 ; White_space # Zs IDEOGRAPHIC SPACE > > > > The XML specification defines a smaller set of characters as > > whitespace, for purposes of whitespace normalization. > > > > So some definition is definitely needed. > > > > 2.6. Required normalization functionality > > > > Section 7.4.11 requires conforming implementations to > support Unicode > > normalization form NFC. > > > > Why is normalization form W3C not also required? > > > > 2.7. Case folding > > > > Sections 7.4.12 and 7.4.13 define functions for case folding. > > Since case folding is not consistent across languages > and locales, we > > have grave doubts about the wisdom of this inclusion, > and some members > > of the WG would advise you to drop these functions, > which are not and > > cannot be language- and culture-neutral. > > > > There is precedent: the decision to drop case-folding > of names from > > the design of XML resulted from the realization that every > > case-folding algorithm available, including the use of > the Unicode > > case mapping tables, has an inherent cultural bias. The > inclusion of > > culturally and linguistically biased functions does not > contribute to > > achieving the goal of universal accessibility for the Web. Some > > members of the XML Schema WG believe your spec should > not go forward > > with these functions in it. > > > > If you retain these functions, you should at the very > least warn users > > that > > > > * Results may violate user expectations (in Québec, > for example, the > > standard uppercase equivalent of "é" is "É", while > in metropolitan > > France it is more commonly "E"; only one of these > is supported by > > the function as defined). > > * Many characters of class Ll lack uppercase > equivalents in the > > Unicode case mapping tables (we stopped counting at > 150 or so); > > many characters of class Lu lack lowercase equivalents. > > * The two functions are not inverses of each other, > so that for a > > string S of upper-case characters, > fn:upper-case(fn:lower-case(S)) > > is not guaranteed to return S, nor is > > fn:lower-case(fn:upper-case(S)) for a string S of lower-case > > characters. Latin small letter dotless i (as used > in Turkish) is > > perhaps the most prominent lower-case letter which will not > > round-trip, as Latin capital letter i with dot > above is the most > > prominent upper-case letter which will not round > trip; there are > > others. > > > > You may also wish to make the case mapping depend on > the default or a > > user-specified collation. > > > > 2.8. Escaping URIs > > > > The rules for escaping URIs should be aligned across all W3C > > specifications; otherwise, we will drive our users crazy. > > > > We think that means that you should reference and implement the > > algorithm specified in the XML Linking specification > > > ([30]http://www.w3.org/TR/2001/REC-xlink-20010627/#link-locators) and > > referenced by XML Schema, or the algorithm given in the > W3C Character > > Model specification (which was the same algorithm the > last time we > > looked). > > > > In particular, some members of the XML Schema WG were > surprised to see > > that your algorithm escapes the percent sign in some > cases but not > > others; this does not seem to be a feature of the > algorithm given by > > XML Linking and by the Character Model. > > > > That said, we believe that you do your readers a good service by > > listing explicitly the affected characters. By > suggesting that you > > refer to the Linking/CharMod algorithm, we do not mean > to suggest that > > you should make your spec less useful by omitting these lists. > > (Editorial note: it would perhaps be useful to some > readers to have a > > brief discussion of why the advice given in the last > paragraph should > > be followed; our readers did not understand the > rationale for this > > advice.) > > > > 2.9. The binary types > > > > Section 12.1.1 says that op:hexBinary-equal returns true if its > > arguments "are of the same length and contain the same > code-points"; > > similarly in 12.1.2 for op:base64Binary-equal. > > > > The term code-point was defined in section 7.1 as > denoting integers > > between 0 and 1114111 (x10FFFF), with a gap in the > range where Unicode > > surrogates occur. It seems to be used here to denote what other > > specifications refer to as octets (bit strings of length 8). > > > > Taking the term code point in the sense of `octet', the > definition > > still does not match our intuitions of what an equality > test on binary > > data must do: it is not enough that each argument > contain the same > > octets; they must contain them in the same order. > > > > Suggested rewording: "are identical strings of octets". > If you wish to > > avoid the word octet, "are identical bit strings" might > do, although > > it omits the relatively important fact that the values > in question > > must have 8×n bits for some integer n. > > > > 2.10. Minor items > > > > 2.10.1. User control of collations > > > > Section 7.3 says in part "This specification does not > use xml:lang to > > identify the default collation, in part because > collations should be > > determined by the user of the data, not (normally) the > data itself, > > and because ..." > > > > The second reason given is sound. The first (collations > should not > > normally be determined by the data) is often advanced > as a principle, > > but does not seem to all members of the XML Schema WG to be > > universally true. We are thus grateful for the > "(normally)" in the > > sentence. But in any case, the first reason given here > leads to a > > non-sequitur: it would be a reason not to make xml:lang > determine the > > collation sequence without possibility of user > override. But it does > > not, even on its face, provide a reason not to use xml:lang to > > identify the default collation. We suggest dropping the > first reason; > > the second suffices. > > > > 2.10.2. Section 7.3.1.1 Examples > > > > The fourth example in section 7.3.1.1 says that > > > > fn:compare('Strassen', 'Straße') > > > > "returns 1 if and only if the default collation > includes provisions > > that equate `ss' and the (German) character `ß' > (`sharp-s')." Unless > > we have misunderstood the definition of the function, > the return value > > should also be 1 if the default collation sorts "ß" > (sharp s) before > > "s". Deleting the phrase "and only if" would remove the error. > > > > 3. Editorial notes > > > > In the course of our work, some editorial points were > noted; we list > > them here for the use of the editors. We do not > particularly expect > > formal responses on these comments. > > > > 1. Definition of must. Section 1.1 defines must thus: > > > > Conforming documents and processors are required > to behave as > > described; otherwise, they are non-conformant or in error. > > > > Is the "or" inclusive or exclusive, or is "in > error" intended as a > > synonym or approximate synonym for > "non-conformant"? Possible > > alternatives: "otherwise, they are non-conformant > and in error", > > "otherwise, they are either non-conformant or else > in error", > > "otherwise, they are non-conformant, i.e. in error". > > > > 2. Definition of stable. In section 1.1, the > definition of stable > > says, inter alia:"Some other functions ... have an explicit > > dependency on the dynamic context". Unless this > means that they > > accept an argument representing the dynamic > context, it seems at > > first glance as if explicit is here used with the meaning > > `implicit'. Perhaps what is intended is that the > documentation > > will explicitly mention this dependency. Perhaps > the best thing to > > do would be just to drop the explicit; if you really wish to > > stress the promise of documentation, perhaps read > "Some other > > functions ...have a depencency on the dynamic > context ... These > > functions are said to be contextual. [INS: > Contextual functions > > are always identified as such in their descriptions. :INS] " > > > > 3. The term back up. The phrase back up appears to be > used several > > times as a technical term (e.g. last paragraph of > 1.7). What does > > it mean? > > > > 4. The term QName. Some readers (including some > members of the XML > > Schema WG) are likely to find it disorienting for > the term QName > > to be used here as a synonym for expanded name or > universal name, > > and not with the same meaning QName has in the XML > Namespaces > > Recommendation. We recognize, however, that what is > returned is > > precisely a member of what XML Schema 1.0 defines > as the value > > space of the xs:QName type, so that the use of the > term xs:QName > > to denote (for example) the return type of the accessor > > fn:node-name is not only unexceptionable but necessary for > > consistency. We don't have a good solution for you > here; we only > > note the difficulty. Perhaps a note calling the > reader's attention > > to the issue would be in order (similar to the note > on this topic > > in the Data Model spec). > > > > Some members of the WG suggest that this spec, like the Data > > Model, should prefer the term expanded QName where > possible, to > > stress that what is referred to is the pair in the > value space, > > not the colonized Name in the lexical space. > > > > 5. No parameters and the empty list of parameters: > > > > On first reading, the signatures of fn:string and > fn:error suggest > > an ambiguity to some readers: the call fn:error() > appears to match > > both the first and the second signatures. > > > > Members of the WG who have studied XQuery more > thoroughly assure > > the rest of us that there is no ambiguity, so our purpose in > > making this comment is merely to call your attention to an > > editorial problem: it might be useful to explain to > the reader why > > the dual signatures showing no arguments and > optional arguments > > are not in fact ambiguous. > > > > 6. Section 2.3, first note: the word this seems to need an > > antecedent; it is not clear to this reader, at > least, what that > > antecedent is. (It's also not clear what problem > with blanks in > > fragment identifiers is being adverted to.) > > > > 7. Raising errors: Section 3 para 1 reads in part: > "The occurrence of > > that phrase [sc. `an error is raised'] implicitly causes the > > invocation of the fn:error function ..." This > formulation seems to > > involve a horrible clash of contexts: the phrase > "an error is > > raised" occurs in this document, and it occurs > continuously from > > the time of publication until the document ceases > to exist (if > > documents can ever cease to exist), while the > error, one expects, > > ought to be raised in a software system which > implements the spec, > > and should probably not be raised continuously from > now until the > > spec ceases to exist, if only because it would make > it hard for > > users to get work done. For the occurrence of a > phrase in the spec > > to cause the raising of an error in conforming > software seems to > > involve a rather unusual kind of action at a > distance. To speak a > > bit more seriously: perhaps the relevant part of > the paragraph > > could be recast, perhaps along these lines: "the > phrase `an error > > is raised' is used to describe the behavior of conforming > > processors in certain situations. When such > situations arise in a > > running system, a conforming implementation of this > specification > > must invoke the fn:error function defined in this > section." This > > is not perfect, but we hope you get the idea. > > > > 8. Type promotion in multiple or single steps: Section > 6.2 says "As > > far as possible, the promotions should be done in a > single step. > > Specifically, when a decimal is promoted to a > double, it must not > > be converted to a float and then to a double, as > this risks loss > > of precision." [Emphasis added.] These two > sentences appear to > > contradict each other: is the rule about > single-step conversions > > required of conforming implementations ("must"), or > recommended > > without being required ("should")? > > > > 9. Code points: The note in section 7.1 identifies > code points as > > Unicode scalar values (which are in turn integers), > but uses the > > notation #x0000 and #x10FFFF to refer to the > minimum and maximum > > values. It's not terribly confusing in context, but strictly > > speaking, this notation is defined in the XML > specification as > > denoting characters, not integers. I believe conventional > > representations for hexadecimal numbers would write > these values > > as 0, 0H, x0, or 0x, and correspondingly 10FFFFH, > x10FFFF, or > > 10FFFFx; there may be other hexadecimal > representations you will > > prefer. The Unicode specification writes 10FFFF[16]. > > > > 10. v and w: Section 7.3 says "`uve' and `uwe' are considered > > equivalent in some European languages"; this is > unexpected. Are > > you sure? Which languages? > > > > 11. Section 7.4 para 1: for "function" read > "functions". Here and > > elsewhere, we believe that sentences like "Several of these > > functions use a collation" would do better if "a > collation" were > > replaced with a plural: "Several of these functions use a > > collation." Unless, of course, all of these > functions always use > > the same collation. > > > > 12. Section 7.4.6.1, final example: forgive this > observation if it's > > clueless, but since there does not seem to be any addition > > operator in the example (did we miss it?), it's not > immediately > > obvious what -INF + INF has to do with the > interpretation of the > > example. > > > > 13. Section 7.4.15, fn:string-pad: this seems an > unfortunate choice of > > names for a function which does not (despite its name) pad a > > string with blanks or some other padding > character(s), but which > > simply replicates or copies the string multiple > times. Could it be > > renamed without excessive heartburn? > > > > 14. Section 7.4.16, fn:escape-uri: It would help > minimize confusion if > > the lists of characters which are or are not > escaped gave the > > character names as well as the characters > themselves in quotation > > marks. (In the paper copy used by one member of our > review task > > force, this bit of the spec was almost impossible > to make out > > without a magnifying glass.) > > > > 15. Section 7.5.3, fn:replace: The description of the > function seemed > > unclear: > > > > The function returns the xs:string that is obtained by > > replacing all non-overlapping substrings of $input that > > match the given $pattern with an occurrence of the > > $replacement string. > > > > Replacing all occurrences of the pattern with an > occurrence of the > > replacement string seems to suggest an n for 1 > exchange. For "all" > > read "each". In the following paragraph, one > occurrence of $input > > is not marked as an identifier, one is. > > > > > > References > > > > 1. http://www.w3.org/ > > 2. http://www.w3.org/Architecture/ > > 3. http://www.w3.org/XML/Group > > 4. http://www.w3.org/XML/Group/Schemas > > 5. http://www.w3.org/Member/Eventscal.html > > 6. http://www.w3.org/Member/#confidential > > 7. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e65 > > 8. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e70 > > 9. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e86 > > 10. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e98 > > 11. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e145 > > 12. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e178 > > 13. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e191 > > 14. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e238 > > 15. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e246 > > 16. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e251 > > 17. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e278 > > 18. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e283 > > 19. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e327 > > 20. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e350 > > 21. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e392 > > 22. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e399 > > 23. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e444 > > 24. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e463 > > 25. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e513 > > 26. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e516 > > 27. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e544 > > 28. http://www.w3.org/XML/Group/2003/07/xmlschema-fo- > > comments.html#d0e577 > > 29. > http://www.w3.org/XML/Group/2003/07/xmlschema-query-notes.html > 30. http://www.w3.org/TR/2001/REC-xlink-20010627/#link-locators > >
Received on Wednesday, 8 October 2003 19:24:53 UTC