RE: XML Schema WG comments on Functions and Operators

Just a clarification about the "#" character. The list of "reserved"
characters in RFC 2396 describes characters that have a special role in a
URI. "#" does not have a special role in a URI, but it does have a special
role in a URI-reference. Since we are dealing with URI-references rather
than URIs, it is appropriate to add "#" to the list.

Michael Kay

> -----Original Message-----
> From: Ashok Malhotra [mailto:ashokma@microsoft.com] 
> Sent: 07 October 2003 23:14
> To: C. M. Sperberg-McQueen; public-qt-comments@w3.org; Kay, Michael
> Cc: W3C XML Schema IG
> Subject: RE: XML Schema WG comments on Functions and Operators
> 
> 
> 
> This is a response to your comment [2.8] below on 
> fn:escape-uri.  I'm copying the I18N WG because this response 
> also addresses some of material in their comments [67], [68] 
> and [69] in 
> http://lists.w3.org/Archives/Public/public-qt-comments/2003Jul
> /0105.html.
> 
> Essentially, your comment said "use the algorithm in the 
> Linking Spec ...". But, as I argue below, the algorithm in 
> the F&O is closer to RFC 2396 than the algorithm in the 
> Linking Spec.  There is one exception to this which is the 
> situation with the # character, of which more later.
> 
> First, let us discuss the behaviour where escape-reserved = 
> 'true'.  I believe this is the algorithm discussed in the 
> Linking Spec. The Linking spec says "the disallowed 
> characters include all non-ASCII characters, plus the 
> excluded characters listed in Section 2.4 of [IETF RFC 2396], 
> except for the number sign (#) and percent sign (%) and the 
> square bracket characters re-allowed in [IETF RFC 2732]. "
> 
> However RFC 2396 says 
> "  Data characters that are allowed in a URI but do not have 
> a reserved
>    purpose are called unreserved.  These include upper and lower case
>    letters, decimal digits, and a limited set of punctuation marks and
>    symbols.
> 
>       unreserved  = alphanum | mark
> 
>       mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | 
> "(" | ")"
> 
>    Unreserved characters can be escaped without changing the semantics
>    of the URI, but this should not be done unless the URI is 
> being used
>    in a context that does not allow the unescaped character 
> to appear."
> 
> Thus, our reading of the above is that all characters except 
> the above should be escaped and, in particular, the marks 
> should not be escaped.
> 
> A little later RFC 2396 says 
> 
> " Because the percent "%" character always has the reserved purpose of
>    being the escape indicator, it must be escaped as "%25" in order to
>    be used as data within a URI."
> 
> Our reading of this rule is that the % must be escaped unless 
> it is the start of an escape sequence %HH.
> 
> This reading of 2396 was the basis of the rule in the F&O which says
> 
> "If $escape-reserved is true, all characters are escaped 
> other than lower case letters a-z, upper case letters A-Z, 
> digits 0-9 and the characters referred to in [RFC 2396] as 
> "marks": specifically, HYPHEN-MINUS ("-"), LOW LINE ("_"), 
> FULL STOP ".", EXCLAMATION MARK "!", TILDE "~", ASTERISK "*", 
> APOSTROPHE "'", LEFT PARENTHESIS "(", and RIGHT PARENTHESIS 
> ")". The PERCENT SIGN "%" character itself is escaped only if 
> it is not followed by two hexadecimal digits (that is, 0-9, 
> a-f and A-F)."
> 
> RFC 2396 says the set of characters included as "reserved" 
> can occur in specific contexts only and must be escaped if 
> they are not used in these contexts.  We interpret this to 
> mean that if they are used correctly, reserved characters may 
> not be escaped.  This is the motivation behind the algorithm 
> for escape-reserved='false'.  The rule in the F&O says
> 
> "If $escape-reserved is false, the behavior differs in that 
> characters referred to in [RFC 2396] and [RFC 2732] as 
> reserved characters, together with the NUMBER SIGN '#' 
> character, are not escaped. These characters are SEMICOLON 
> ";", SOLIDUS "/", QUESTION MARK "?", COLON ":", COMMERCIAL AT 
> "@", AMPERSAND "&", EQUALS SIGN "=", PLUS SIGN "+", DOLLAR 
> SIGN "$", COMMA "," NUMBER SIGN "#", LEFT SQUARE BRACKET "[" 
> and RIGHT SQUARE BRACKET "]"."
> 
> The set of reserved characters in the above is correct 
> according to RFC 2396 amended by RFC 2732 except for the 
> inclusion of the # character.
> 
> Thus, if we have read the background material correctly there 
> are two possible actions.  Please advise.
>  
> POSSIBLE ACTIONS:
> 
> 1.  Do nothing except remove the # sign from the set of 
> reserved characters in the rule for escape-reserved='false'. 
> 2.  Change the wording to conform to the Linking Spec even 
> though it is at variance with RFC 2396. 
> 
> All the best, Ashok
> 
> > -----Original Message-----
> > From: public-qt-comments-request@w3.org [mailto:public-qt-comments- 
> > request@w3.org] On Behalf Of C. M. Sperberg-McQueen
> > Sent: Friday, August 01, 2003 7:55 PM
> > To: public-qt-comments@w3.org
> > Cc: W3C XML Schema IG
> > Subject: XML Schema WG comments on Functions and Operators
> > 
> > 
> > Dear colleagues:
> > 
> > The XML Schema Working Group congratulates the XML Query and XSL 
> > Working Groups on their progress, and in particular on the 
> Last Call 
> > draft of "XQuery 1.0 and XPath 2.0 Functions and Operators".
> > 
> > We have not been able to review the last call draft in as 
> much detail 
> > as we would have liked, but for what they are worth our 
> comments are 
> > at 
> http://www.w3.org/XML/Group/2003/07/xmlschema-fo-comments.html (an 
> > ASCII version is reproduced below for the convenience of those with 
> > access to their email but not to the Web).
> > 
> > We apologize for the tardy arrival of these notes.
> > 
> > -C. M. Sperberg-McQueen, for the W3C XML Schema WG
> > 
> > ................................................................
> > 
> > 
> >     [1]W3C [2]Architecture Domain [3]XML | [4]XML Schema | [5]Member
> >     Events | [6]Member-Confidential!
> > 
> > W3C XML Schema WG
> > 
> > Notes on XQuery 1.0 and XPath 2.0 Functions and Operators
> > 
> > 1 August 2003
> >       
> > _________________________________________________________________
> > 
> >       * 1. [7]Schema-related issues
> >            + 1.1. [8]Alignment of date/time values
> >            + 1.2. [9]The type anyAtomicType
> >            + 1.3. [10]The type untypedAtomic
> >            + 1.4. [11]Alignment on strings and URIs
> >            + 1.5. [12]Whitespace handling and lexical forms
> >            + 1.6. [13]Negative zero
> >            + 1.7. [14]Totally ordered Booleans
> >       * 2. [15]Other technical issues
> >            + 2.1. [16]The fn:base-uri property
> >            + 2.2. [17]Alignment of references
> >            + 2.3. [18]Characters and collation units
> >            + 2.4. [19]Surrogate pairs and Unicode scalar values
> >            + 2.5. [20]Definition of whitespace
> >            + 2.6. [21]Required normalization functionality
> >            + 2.7. [22]Case folding
> >            + 2.8. [23]Escaping URIs
> >            + 2.9. [24]The binary types
> >            + 2.10. [25]Minor items
> >                 o 2.10.1. [26]User control of collations
> >                 o 2.10.2. [27]Section 7.3.1.1 Examples
> >       * 3. [28]Editorial notes
> >       
> > _________________________________________________________________
> > 
> >     This document contains comments on the Last Call draft 
> of 2 May 2003
> >     of XQuery 1.0 and XPath 2.0 Functions and Operators 
> transmitted to the
> >     XML Query and XSL Working Groups on behalf of the XML 
> Schema Working
> >     Group. These draft comments have not been reviewed by 
> the XML Schema
> >     Working Group and do not necessarily command consensus 
> within the
> >     group; because we will not meet again until 28 August, 
> the Working
> >     Group directed at its meeting today that these notes should be
> >     transmitted to the XML Query and XSL Working Groups 
> without awaiting
> >     review.
> >     In addition to the comments below, please note that 
> several of the
> >     [29]general comments sent on 14 July relate to the functions and
> >     operators and data model specifications. Some of those 
> comments sent
> >     earlier overlap with some comments below.
> > 
> > 1. Schema-related issues
> > 
> >     The comments in this section relate to the use of XML 
> Schema in the
> >     F/O specification and thus to the particular area of 
> responsibility
> >     borne by the XML Schema WG.
> > 
> > 1.1. Alignment of date/time values
> > 
> >     The provision for preserving timezone information in 
> the values of
> >     xs:dateTime, xs:date, and xs:time continues to concern 
> us. We believe
> >     that a discrepancy of this kind between F/O and XML 
> Schema will hurt
> >     users and impede uptake of both specifications.
> > 
> >     We believe F/O and XML Schema need to align on this, 
> either by F/O
> >     changing to the XML Schema value space, or by changing 
> the value space
> >     as part of XML Schema 1.1, or by some other mutually agreed upon
> >     solution.
> > 
> > 1.2. The type anyAtomicType
> > 
> >     We reiterate our concern over the introduction of 
> anyAtomicType into
> >     the type hierarchy. We believe that a discrepancy of 
> this kind between
> >     F/O and XML Schema will hurt users and impede uptake of both
> >     specifications.
> > 
> >     We believe F/O and XML Schema need to align on this, 
> either by F/O
> >     aligning with XML Schema 1.0 or by XML Schema 1.1 aligning with 
> > F/O.
> > 
> > 1.3. The type untypedAtomic
> > 
> >     We reiterate our concern over the introduction of 
> untypedAtomic into
> >     the type hierarchy. As with the other discrepancies, we believe
> >     alignment of the QT specs and XML Schema is critically 
> important.
> >     Section 1.3.2 says xdt:untypedAtomic is used wherever 
> the PSVI has
> >     xs:anySimpleType; please note that in the PSVI, this 
> will be the 
> > case
> > 
> >       * when the element or attribute in question was 
> declared as having
> >         type anySimpleType
> >       * when the attribute in question had no declaration 
> and the schema
> >         processor assumed the simple urtype for it in the 
> course of lax
> >         validation or error recovery
> > 
> >     Note that elements will not be assigned the 
> anySimpleType as their
> >     type property in the course of lax validation or error 
> recovery; they
> >     will have xs:anyType instead. Your use of xdt:untypedAtomic for
> >     xs:anySimpleType but not for elements which (a) lack 
> child elements
> >     and (b) are assigned to xs:anyType may lead to results 
> which puzzle
> >     some of your users; we believe you may wish to consider 
> changing your
> >     mapping rules to assign xsd:untypedAtomic to such elements.
> > 
> > 1.4. Alignment on strings and URIs
> > 
> >     The table at the beginning of section 2, Accessors, 
> shows functions
> >     which are intended (judging by their names) to return 
> URIs and which
> >     return values of type xs:string instead of xs:anyURI. Similarly,
> >     various functions which accept URIs as arguments are 
> given signatures
> >     using xs:string as the type, which in turn necessitates 
> ad hoc rules
> >     of the form "If $collationLiteral is not in the lexical space of
> >     xs:anyURI, an error is raised".
> > 
> >     As you know from our inquiry to you in mid-July, it has 
> been suggested
> >     that in XML Schema 1.1 the xs:anyURI type be made a 
> restriction of
> >     xs:string. But for now, there appears to be a 
> discrepancy between the
> >     use of strings to represent URIs here and the provision 
> of a distinct
> >     (and, for typing purposes, disjoint) type in XML Schema 1.0.
> >     We need to align on this.
> > 
> > 1.5. Whitespace handling and lexical forms
> > 
> >     In section 5.1, paragraph 4 reads in part: "If the argument to a
> >     constructor function is a string literal, the literal 
> must be a valid
> >     lexical form for its type ... Whitespace normalization 
> is applied
> >     before validation ..."
> > 
> >     In all the cases which immediately come to mind, if the 
> argument is a
> >     valid lexical form for a type, there is no need to perform any
> >     whitespace normalization on it. In XML Schema, it is 
> the result of
> >     whitespace normalization, not the input to it, which 
> must be a legal
> >     lexical form; we believe readers will be less confused 
> if your usage
> >     of the terms and ours is consistent.
> > 
> >     A possible rewording: "If the argument to a constructor 
> function is a
> >     string literal, then whitespace normalization is 
> applied as indicated
> >     by the whitespace facet for the datatype. The 
> whitespace-normalized
> >     string must be a valid lexical form for the type, as specified 
> > ..."
> > 
> > 1.6. Negative zero
> > 
> >     In section 6, a note explains that the value space of 
> xs:float and
> >     xs:double has been extended vis-à-vis that given by XML 
> Schema, to
> >     include a negative zero. The note also explains that 
> the negative zero
> >     will "never be obtained from the typed value of a node."
> > 
> >     We believe this discrepancy is untenable, and we are 
> not clear why it
> >     has proven necessary to introduce it.
> > 
> >     As far as we can tell by examining the specification, the spec
> >     mentions different treatment for positive and negative 
> zero only for
> >     the functions described in section 6.4 (fn:floor, fn:ceiling,
> >     fn:round, and fn:round-half-to-even): in the 
> description of each of
> >     these functions it is noted that if a zero is given to 
> the function as
> >     an argument, the sign of the zero returned as the value of the
> >     function is the same as the sign of the zero passed in 
> as an argument.
> >     (The discussion of fn:ceiling mentions other cases when 
> negative zero
> >     is returned; the discussion of fn:floor passes over the 
> analogous
> >     cases in silence.) Other mentions of the signed zeroes in this
> >     specification invariably specify either that something 
> is true both
> >     for positive and for negative zero or else that a 
> constructor may
> >     return either a positive or a negative zero.
> > 
> >     Could you explain the motive for introducing this 
> discrepancy with the
> >     value space defined in XML Schema? Would it not suffice 
> to observe
> >     that IEEE 754 has both positive and negative zeroes, 
> which are treated
> >     as different machine representations of the same values in the
> >     xs:float and xs:double value spaces, and (optionally) 
> that the prose
> >     occasionally mentions these distinct representations of 
> zero in the
> >     interests of alignment with IEEE 754, even though 
> formally they are
> >     the same value?
> > 
> >     Is it essential to introduce an incompatibility with 
> XML Schema here
> >     instead of treating positive and negative zeroes as one 
> value with two
> >     machine representations?
> > 
> > 1.7. Totally ordered Booleans
> > 
> >     We do not believe that it makes sense to impose a user-visible
> >     ordering on the Boolean data type. Can you explain the 
> rationale?
> >     This is a discrepancy between F/O and XML Schema which must, we
> >     believe, be aligned.
> > 
> > 2. Other technical issues
> > 
> >     The comments in this section relate to technical issues 
> other than the
> >     use of XML Schema in the F/O specification; the XML 
> Schema WG claims
> >     no particular responsibility or expertise on these questions but
> >     raises them because they seem to need attention.
> > 
> > 2.1. The fn:base-uri property
> > 
> >     In section 2.5, the first paragraph defines a base-uri 
> property for
> >     all node types: "Document, element and 
> processing-instruction nodes
> >     have a base-uri property.... The base-uri of all other 
> node types is
> >     the empty sequence."
> > 
> >     The next paragraph begins by explaining what happens 
> "If the accessor
> >     is called on a node that does not have a base-uri 
> property ..." If all
> >     nodes have the property, how can such a node exist?
> > 
> > 2.2. Alignment of references
> > 
> >     XML Schema and the Functions and Operators spec should 
> refer to the
> >     same version of Unicode. At the moment, this appears not to be 
> > true.
> > 
> > 2.3. Characters and collation units
> > 
> >     The discussion of collation units in the second note of 
> section 7.3
> >     says that collation decomposes a string "into a 
> sequence of units,
> >     each unit consisting of one or more characters", and 
> that various
> >     comparison operations are performed on these units. The 
> functions
> >     fn:starts-with, fn:ends-with, fn:substring-before, and
> >     fn:substring-after are all mentioned as operating on 
> such a segmented
> >     string.
> > 
> >     The list of functions at the beginning of section 7.4, however,
> >     describes them as operating on characters, not on the nameless
> >     collation units consisting of one or more characters 
> each. This looks
> >     like a contradiction.
> > 
> >     We believe that the general level of confusion is best 
> minimized, and
> >     the world becomes a better place, if in XML-related 
> specifications the
> >     word character is used always and only for the units of 
> the Universal
> >     Character Set defined by Unicode and by ISO 10646. The 
> word should not
> >     be used (however great the temptation becomes at times) 
> to denote the
> >     culturally specific units of writing systems (e.g. 
> letters, symbols,
> >     signs, graphemes, or what have you).
> > 
> >     We suggest recasting the descriptions in 7.4 to 
> describe the effect of
> >     the functions in terms of the collation units, rather 
> than in terms of
> >     characters. In order to avoid repeating the phrase "the 
> nameless units
> >     of one or more characters into which a collation 
> segments a string for
> >     purposes of comparison", you may wish to define the term letter,
> >     grapheme, collation unit, or thingy with that meaning.
> > 
> > 2.4. Surrogate pairs and Unicode scalar values
> > 
> >     Section 7.4.6 (like some others) has a note calling 
> attention to the
> >     fact that some implementations will represent 
> characters with code
> >     points higher than xFFFF by using surrogate pairs. You 
> quite correctly
> >     avoid using the term code point for the things which make up the
> >     surrogate pair, since in section 7.1 you have defined 
> code point as
> >     excluding surrogates. But the term 16-bit values is not 
> defined, as
> >     far as we can tell.
> > 
> >     Also, in Unicode 2 and 3 there are (as far as we have 
> been able to
> >     tell) no rules that forbid a double encoding of 
> characters outside the
> >     Basic Multilingual Plane (i.e. first representing them 
> within the BMP
> >     as surrogate pairs, and then encoding the sequence of 
> BMP items in
> >     UTF-8). Even if it is discouraged (and it is indeed outlawed in
> >     Unicode 4.0), surrogate pairs might well show up not 
> only in UTF-16
> >     but also in UTF-8, where they will presumably be presented by
> >     Unicode-oblivious character libraries not as pairs of 
> 16-bit values
> >     but as four-octet sequences whose intepretation in 
> terms of Unicode
> >     scalar values requires slightly special rules.
> > 
> >     Note that the definition of code points given in 
> section 7.1 agrees
> >     with the definition of Unicode scalar values in Unicode 4.0 in
> >     excluding the surrogate range, but not with Unicode 2.0 
> (the version
> >     cited in your normative references), or Unicode 3, 
> which define a
> >     Unicode scalar value as "a number N from 0 to 
> 10FFF[16]", without
> >     leaving any gap for the surrogates.
> > 
> > 2.5. Definition of whitespace
> > 
> >     Section 7.4.10 defines the function fn:normalize-space as doing
> >     various things to whitespace, but it does not define the term
> >     whitespace. It should, since various definitions are possible.
> >     The Unicode character database, for example, lists the following
> >     Unicode characters as whitespace in the file PropList-3_1_0.txt:
> > 
> >       * 0009..000D ; White_space # Cc [5] <control>..<control>
> >       * 0020 ; White_space # Zs SPACE
> >       * 0085 ; White_space # Cc <control>
> >       * 00A0 ; White_space # Zs NO-BREAK SPACE
> >       * 1680 ; White_space # Zs OGHAM SPACE MARK
> >       * 2000..200A ; White_space # Zs [11] EN QUAD..HAIR SPACE
> >       * 2028 ; White_space # Zl LINE SEPARATOR
> >       * 2029 ; White_space # Zp PARAGRAPH SEPARATOR
> >       * 202F ; White_space # Zs NARROW NO-BREAK SPACE
> >       * 3000 ; White_space # Zs IDEOGRAPHIC SPACE
> > 
> >     The XML specification defines a smaller set of characters as
> >     whitespace, for purposes of whitespace normalization.
> > 
> >     So some definition is definitely needed.
> > 
> > 2.6. Required normalization functionality
> > 
> >     Section 7.4.11 requires conforming implementations to 
> support Unicode
> >     normalization form NFC.
> > 
> >     Why is normalization form W3C not also required?
> > 
> > 2.7. Case folding
> > 
> >     Sections 7.4.12 and 7.4.13 define functions for case folding.
> >     Since case folding is not consistent across languages 
> and locales, we
> >     have grave doubts about the wisdom of this inclusion, 
> and some members
> >     of the WG would advise you to drop these functions, 
> which are not and
> >     cannot be language- and culture-neutral.
> > 
> >     There is precedent: the decision to drop case-folding 
> of names from
> >     the design of XML resulted from the realization that every
> >     case-folding algorithm available, including the use of 
> the Unicode
> >     case mapping tables, has an inherent cultural bias. The 
> inclusion of
> >     culturally and linguistically biased functions does not 
> contribute to
> >     achieving the goal of universal accessibility for the Web. Some
> >     members of the XML Schema WG believe your spec should 
> not go forward
> >     with these functions in it.
> > 
> >     If you retain these functions, you should at the very 
> least warn users
> >     that
> > 
> >       * Results may violate user expectations (in Québec, 
> for example, the
> >         standard uppercase equivalent of "é" is "É", while 
> in metropolitan
> >         France it is more commonly "E"; only one of these 
> is supported by
> >         the function as defined).
> >       * Many characters of class Ll lack uppercase 
> equivalents in the
> >         Unicode case mapping tables (we stopped counting at 
> 150 or so);
> >         many characters of class Lu lack lowercase equivalents.
> >       * The two functions are not inverses of each other, 
> so that for a
> >         string S of upper-case characters, 
> fn:upper-case(fn:lower-case(S))
> >         is not guaranteed to return S, nor is
> >         fn:lower-case(fn:upper-case(S)) for a string S of lower-case
> >         characters. Latin small letter dotless i (as used 
> in Turkish) is
> >         perhaps the most prominent lower-case letter which will not
> >         round-trip, as Latin capital letter i with dot 
> above is the most
> >         prominent upper-case letter which will not round 
> trip; there are
> >         others.
> > 
> >     You may also wish to make the case mapping depend on 
> the default or a
> >     user-specified collation.
> > 
> > 2.8. Escaping URIs
> > 
> >     The rules for escaping URIs should be aligned across all W3C
> >     specifications; otherwise, we will drive our users crazy.
> > 
> >     We think that means that you should reference and implement the
> >     algorithm specified in the XML Linking specification
> >     
> ([30]http://www.w3.org/TR/2001/REC-xlink-20010627/#link-locators) and
> >     referenced by XML Schema, or the algorithm given in the 
> W3C Character
> >     Model specification (which was the same algorithm the 
> last time we
> >     looked).
> > 
> >     In particular, some members of the XML Schema WG were 
> surprised to see
> >     that your algorithm escapes the percent sign in some 
> cases but not
> >     others; this does not seem to be a feature of the 
> algorithm given by
> >     XML Linking and by the Character Model.
> > 
> >     That said, we believe that you do your readers a good service by
> >     listing explicitly the affected characters. By 
> suggesting that you
> >     refer to the Linking/CharMod algorithm, we do not mean 
> to suggest that
> >     you should make your spec less useful by omitting these lists.
> >     (Editorial note: it would perhaps be useful to some 
> readers to have a
> >     brief discussion of why the advice given in the last 
> paragraph should
> >     be followed; our readers did not understand the 
> rationale for this
> >     advice.)
> > 
> > 2.9. The binary types
> > 
> >     Section 12.1.1 says that op:hexBinary-equal returns true if its
> >     arguments "are of the same length and contain the same 
> code-points";
> >     similarly in 12.1.2 for op:base64Binary-equal.
> > 
> >     The term code-point was defined in section 7.1 as 
> denoting integers
> >     between 0 and 1114111 (x10FFFF), with a gap in the 
> range where Unicode
> >     surrogates occur. It seems to be used here to denote what other
> >     specifications refer to as octets (bit strings of length 8).
> > 
> >     Taking the term code point in the sense of `octet', the 
> definition
> >     still does not match our intuitions of what an equality 
> test on binary
> >     data must do: it is not enough that each argument 
> contain the same
> >     octets; they must contain them in the same order.
> > 
> >     Suggested rewording: "are identical strings of octets". 
> If you wish to
> >     avoid the word octet, "are identical bit strings" might 
> do, although
> >     it omits the relatively important fact that the values 
> in question
> >     must have 8×n bits for some integer n.
> > 
> > 2.10. Minor items
> > 
> > 2.10.1. User control of collations
> > 
> >     Section 7.3 says in part "This specification does not 
> use xml:lang to
> >     identify the default collation, in part because 
> collations should be
> >     determined by the user of the data, not (normally) the 
> data itself,
> >     and because ..."
> > 
> >     The second reason given is sound. The first (collations 
> should not
> >     normally be determined by the data) is often advanced 
> as a principle,
> >     but does not seem to all members of the XML Schema WG to be
> >     universally true. We are thus grateful for the 
> "(normally)" in the
> >     sentence. But in any case, the first reason given here 
> leads to a
> >     non-sequitur: it would be a reason not to make xml:lang 
> determine the
> >     collation sequence without possibility of user 
> override. But it does
> >     not, even on its face, provide a reason not to use xml:lang to
> >     identify the default collation. We suggest dropping the 
> first reason;
> >     the second suffices.
> > 
> > 2.10.2. Section 7.3.1.1 Examples
> > 
> >     The fourth example in section 7.3.1.1 says that
> > 
> >        fn:compare('Strassen', 'Straße')
> > 
> >     "returns 1 if and only if the default collation 
> includes provisions
> >     that equate `ss' and the (German) character `ß' 
> (`sharp-s')." Unless
> >     we have misunderstood the definition of the function, 
> the return value
> >     should also be 1 if the default collation sorts "ß" 
> (sharp s) before
> >     "s". Deleting the phrase "and only if" would remove the error.
> > 
> > 3. Editorial notes
> > 
> >     In the course of our work, some editorial points were 
> noted; we list
> >     them here for the use of the editors. We do not 
> particularly expect
> >     formal responses on these comments.
> > 
> >      1. Definition of must. Section 1.1 defines must thus:
> > 
> >           Conforming documents and processors are required 
> to behave as
> >           described; otherwise, they are non-conformant or in error.
> > 
> >         Is the "or" inclusive or exclusive, or is "in 
> error" intended as a
> >         synonym or approximate synonym for 
> "non-conformant"? Possible
> >         alternatives: "otherwise, they are non-conformant 
> and in error",
> >         "otherwise, they are either non-conformant or else 
> in error",
> >         "otherwise, they are non-conformant, i.e. in error".
> > 
> >      2. Definition of stable. In section 1.1, the 
> definition of stable
> >         says, inter alia:"Some other functions ... have an explicit
> >         dependency on the dynamic context". Unless this 
> means that they
> >         accept an argument representing the dynamic 
> context, it seems at
> >         first glance as if explicit is here used with the meaning
> >         `implicit'. Perhaps what is intended is that the 
> documentation
> >         will explicitly mention this dependency. Perhaps 
> the best thing to
> >         do would be just to drop the explicit; if you really wish to
> >         stress the promise of documentation, perhaps read 
> "Some other
> >         functions ...have a depencency on the dynamic 
> context ... These
> >         functions are said to be contextual. [INS: 
> Contextual functions
> >         are always identified as such in their descriptions. :INS] "
> > 
> >      3. The term back up. The phrase back up appears to be 
> used several
> >         times as a technical term (e.g. last paragraph of 
> 1.7). What does
> >         it mean?
> > 
> >      4. The term QName. Some readers (including some 
> members of the XML
> >         Schema WG) are likely to find it disorienting for 
> the term QName
> >         to be used here as a synonym for expanded name or 
> universal name,
> >         and not with the same meaning QName has in the XML 
> Namespaces
> >         Recommendation. We recognize, however, that what is 
> returned is
> >         precisely a member of what XML Schema 1.0 defines 
> as the value
> >         space of the xs:QName type, so that the use of the 
> term xs:QName
> >         to denote (for example) the return type of the accessor
> >         fn:node-name is not only unexceptionable but necessary for
> >         consistency. We don't have a good solution for you 
> here; we only
> >         note the difficulty. Perhaps a note calling the 
> reader's attention
> >         to the issue would be in order (similar to the note 
> on this topic
> >         in the Data Model spec).
> > 
> >         Some members of the WG suggest that this spec, like the Data
> >         Model, should prefer the term expanded QName where 
> possible, to
> >         stress that what is referred to is the pair in the 
> value space,
> >         not the colonized Name in the lexical space.
> > 
> >      5. No parameters and the empty list of parameters:
> > 
> >         On first reading, the signatures of fn:string and 
> fn:error suggest
> >         an ambiguity to some readers: the call fn:error() 
> appears to match
> >         both the first and the second signatures.
> > 
> >         Members of the WG who have studied XQuery more 
> thoroughly assure
> >         the rest of us that there is no ambiguity, so our purpose in
> >         making this comment is merely to call your attention to an
> >         editorial problem: it might be useful to explain to 
> the reader why
> >         the dual signatures showing no arguments and 
> optional arguments
> >         are not in fact ambiguous.
> > 
> >      6. Section 2.3, first note: the word this seems to need an
> >         antecedent; it is not clear to this reader, at 
> least, what that
> >         antecedent is. (It's also not clear what problem 
> with blanks in
> >         fragment identifiers is being adverted to.)
> > 
> >      7. Raising errors: Section 3 para 1 reads in part: 
> "The occurrence of
> >         that phrase [sc. `an error is raised'] implicitly causes the
> >         invocation of the fn:error function ..." This 
> formulation seems to
> >         involve a horrible clash of contexts: the phrase 
> "an error is
> >         raised" occurs in this document, and it occurs 
> continuously from
> >         the time of publication until the document ceases 
> to exist (if
> >         documents can ever cease to exist), while the 
> error, one expects,
> >         ought to be raised in a software system which 
> implements the spec,
> >         and should probably not be raised continuously from 
> now until the
> >         spec ceases to exist, if only because it would make 
> it hard for
> >         users to get work done. For the occurrence of a 
> phrase in the spec
> >         to cause the raising of an error in conforming 
> software seems to
> >         involve a rather unusual kind of action at a 
> distance. To speak a
> >         bit more seriously: perhaps the relevant part of 
> the paragraph
> >         could be recast, perhaps along these lines: "the 
> phrase `an error
> >         is raised' is used to describe the behavior of conforming
> >         processors in certain situations. When such 
> situations arise in a
> >         running system, a conforming implementation of this 
> specification
> >         must invoke the fn:error function defined in this 
> section." This
> >         is not perfect, but we hope you get the idea.
> > 
> >      8. Type promotion in multiple or single steps: Section 
> 6.2 says "As
> >         far as possible, the promotions should be done in a 
> single step.
> >         Specifically, when a decimal is promoted to a 
> double, it must not
> >         be converted to a float and then to a double, as 
> this risks loss
> >         of precision." [Emphasis added.] These two 
> sentences appear to
> >         contradict each other: is the rule about 
> single-step conversions
> >         required of conforming implementations ("must"), or 
> recommended
> >         without being required ("should")?
> > 
> >      9. Code points: The note in section 7.1 identifies 
> code points as
> >         Unicode scalar values (which are in turn integers), 
> but uses the
> >         notation #x0000 and #x10FFFF to refer to the 
> minimum and maximum
> >         values. It's not terribly confusing in context, but strictly
> >         speaking, this notation is defined in the XML 
> specification as
> >         denoting characters, not integers. I believe conventional
> >         representations for hexadecimal numbers would write 
> these values
> >         as 0, 0H, x0, or 0x, and correspondingly 10FFFFH, 
> x10FFFF, or
> >         10FFFFx; there may be other hexadecimal 
> representations you will
> >         prefer. The Unicode specification writes 10FFFF[16].
> > 
> >     10. v and w: Section 7.3 says "`uve' and `uwe' are considered
> >         equivalent in some European languages"; this is 
> unexpected. Are
> >         you sure? Which languages?
> > 
> >     11. Section 7.4 para 1: for "function" read 
> "functions". Here and
> >         elsewhere, we believe that sentences like "Several of these
> >         functions use a collation" would do better if "a 
> collation" were
> >         replaced with a plural: "Several of these functions use a
> >         collation." Unless, of course, all of these 
> functions always use
> >         the same collation.
> > 
> >     12. Section 7.4.6.1, final example: forgive this 
> observation if it's
> >         clueless, but since there does not seem to be any addition
> >         operator in the example (did we miss it?), it's not 
> immediately
> >         obvious what -INF + INF has to do with the 
> interpretation of the
> >         example.
> > 
> >     13. Section 7.4.15, fn:string-pad: this seems an 
> unfortunate choice of
> >         names for a function which does not (despite its name) pad a
> >         string with blanks or some other padding 
> character(s), but which
> >         simply replicates or copies the string multiple 
> times. Could it be
> >         renamed without excessive heartburn?
> > 
> >     14. Section 7.4.16, fn:escape-uri: It would help 
> minimize confusion if
> >         the lists of characters which are or are not 
> escaped gave the
> >         character names as well as the characters 
> themselves in quotation
> >         marks. (In the paper copy used by one member of our 
> review task
> >         force, this bit of the spec was almost impossible 
> to make out
> >         without a magnifying glass.)
> > 
> >     15. Section 7.5.3, fn:replace: The description of the 
> function seemed
> >         unclear:
> > 
> >             The function returns the xs:string that is obtained by
> >             replacing all non-overlapping substrings of $input that
> >             match the given $pattern with an occurrence of the
> >             $replacement string.
> > 
> >         Replacing all occurrences of the pattern with an 
> occurrence of the
> >         replacement string seems to suggest an n for 1 
> exchange. For "all"
> >         read "each". In the following paragraph, one 
> occurrence of $input
> >         is not marked as an identifier, one is.
> > 
> > 
> > References
> > 
> >     1. http://www.w3.org/
> >     2. http://www.w3.org/Architecture/
> >     3. http://www.w3.org/XML/Group
> >     4. http://www.w3.org/XML/Group/Schemas
> >     5. http://www.w3.org/Member/Eventscal.html
> >     6. http://www.w3.org/Member/#confidential
> >     7. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e65
> >     8. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e70
> >     9. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e86
> >    10. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e98
> >    11. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e145
> >    12. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e178
> >    13. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e191
> >    14. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e238
> >    15. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e246
> >    16. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e251
> >    17. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e278
> >    18. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e283
> >    19. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e327
> >    20. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e350
> >    21. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e392
> >    22. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e399
> >    23. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e444
> >    24. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e463
> >    25. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e513
> >    26. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e516
> >    27. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e544
> >    28. http://www.w3.org/XML/Group/2003/07/xmlschema-fo-
> > comments.html#d0e577
> >    29. 
> http://www.w3.org/XML/Group/2003/07/xmlschema-query-notes.html

>    30. http://www.w3.org/TR/2001/REC-xlink-20010627/#link-locators
> 
> 

Received on Wednesday, 8 October 2003 19:24:53 UTC