- From: Paul Cotton <pcotton@microsoft.com>
- Date: Tue, 4 Jul 2000 09:08:34 -0700
- To: "'Martin J. Duerst'" <duerst@w3.org>
- Cc: w3c-i18n-ig@w3.org, "'www-xml-query-comments@w3.org'" <www-xml-query-comments@w3.org>
Thank you for your input on the XML Query Requirements document. In the future please send such input to the www-xml-query-comments@w3.org email list so that they are publicly visible. Your input arrived after the XML Query WG had decided to republish its current Requirements document. We are currently working on responses to your input and we will include any changes as a result of your input in a subsequent version of our Requirements document. /paulc -----Original Message----- From: Martin J. Duerst [mailto:duerst@w3.org] Sent: Monday, June 26, 2000 3:53 AM To: Paul Cotton; w3c-xml-query-wg@w3.org Cc: w3c-i18n-ig@w3.org Subject: I18N Requirements on XML Queries Hello Paul, dear XML Query WG, At its last f2f in Paris (the first taking 3 days instead of only 2 days), the I18N WG finally got around to have a careful look at XML Query Requirements. We apologize for our delay. Below please find a list of requirements that we have drawn up as part of the minutes of the recent meeting. Please note that the list below cannot be complete, in the sense that we cannot guarantee that if all the requirements below are addresses, XML Queries are appropriately internationalized. Rather, the requirements below provide input for further work. Please feel free to contact us at any time if you have any questions or comments. Regards, Martin. [31]XML Query Requirements [31] http://www.w3.org/TR/xmlquery-req Apologies for delay. Our comments will evolve as we see the evolution of the spec and our understanding of it evolves; we would like to work closely with the Query WG. Our current comments on XML Query Requirements are: a. The XML Query language should work equally well with any data or query, regardless of the (human) languages or locales involved. (This does not imply automatic translation!) b. IURI: the requirements must address the existence of URI references containing non-ASCII characters according to the Character Model. c. UCS evolution: the requirements must address the issue of UCS evolution (in particular the addition of new characters). d. 3.3.2: please remove "1.0" from "XML 1.0 character data". e. 3.4.14 to 3.4.16: why "SHOULD" rather than "MUST"? f. 3.4.17: add "locale" and "time zone" to the list ("such as..."). Clarify what "in which the query is executed" means. In particular, the relevant information generally relates to the user. The locale and time zone of the server are generally irrelevant to the results of the query. g. The semantics of queries should have a clear interpretation with respect to locale. Make sure that all aspects of the language are either locale-independent or that sufficient locale information is contained in the query to make it unambiguous. A locale-independent approach should be adopted wherever possible (e.g. number format), localization being handled at the user end. In some cases (such as a query of strings that collate between "Smith" and "Thomas"), it is necessary to have certain locale information ("in the Danish sorting order") be part of the query. h. Ideally, it should be possible to transmit arbitrary collating tailoring information with a query. (Cf. Unicode Technical Report # 10 for details on collating and tailoring). i. It may not possible for a processor to use collating information based upon an arbitary tailoring or a specified locale (e.g. for performance reasons or unavailability of the collating data for the specified locale). In such a case, a query must not simply return false results: it may decline the query or return results according to another collating sequence, together with a warning of that fact. j. Section 4: strengthen the statement about the relationship between our 2 groups. k. Section 4: change "W3C goals for international access to the Web" to "W3C goals for i18n". l. The data model must account for inherited attributes (such as xml:lang). m. Query processors need to know about the structure of xml:lang. If a query asks for a match of the string "chat" with xml:lang="fr", the query should match data with xml:lang="fr-BE". Note: the language tag spec, RFC 1766, is currently being extended (3-letter language codes being introduced). The XML 1.0 spec has been amended to take this into account. n. It is a goal of i18n that queries involving string matching ("select x where x='some_constant'") treat canonically equivalent strings (in the Unicode sense) as matching. If the query and the target are both XML, early normalization (as per the Character Model) is assumed and binary comparison ensures that the equivalence requirement is satisfied. However, if the target is originally a legacy database which logically has a layer that exports the data as XML, that XML must be exported in normalized form. The XML Query spec must impose the normalization requirement upon such layers. o. Similarly, the query may come from a user-interface layer that creates the XML query. The XML Query spec must impose the normalization requirement upon such layers. p. Provided that the query and the target are in normalized form C, the output of the query must itself be in normalized form C. q. Queries involving string matching should support various kinds of loose matching (such as case-insensitivity, katakana-hiragana equivalence, accent-accentless equivalence, etc.) r. If such features as case-insensitivity are present in queries involving string matching, these features must be properly internationalized (e.g. case folding works for accented letters) and language-dependence must be taken into account (e.g. Turkish dotless-i). s. Queries involving character counting and indexing must take into account the Character Model. Specifically, they should follow Layer 3 (locale-independent graphemes). Additional details can be found in The Unicode Standard 3.0 and UTR#18. Queries involving word counting and indexing should similarly follow the recommendations in these references.
Received on Tuesday, 4 July 2000 12:10:18 UTC