FW: I18N Requirements on XML Queries from Peter Fankhauser on 2000-08-24 (www-xml-query-comments@w3.org from August 2000)

From: Peter Fankhauser <fankp@darmstadt.gmd.de>
Date: Thu, 24 Aug 2000 09:38:27 +0200
To: <www-xml-query-comments@w3.org>
Cc: "W3C XML Query WG \(E-mail\)" <w3c-xml-query-wg@w3.org>
Message-ID: <KNEAKBHGPLOADKCAOIFNKEPACAAA.fankp@darmstadt.gmd.de>
Thank you for this detailed review. The XML Query WG has come to
a conclusive position on all issues but a, (one of) e, l, m.
These will be dealt in a future response.

Best regards,

Peter Fankhauser

n.b., all references to the requirements document below
relate to the version from January 31, 2000:
http://www.w3.org/TR/2000/WD-xmlquery-req-20000131
and not to the meanwhile published revision from August 15:
http://www.w3.org/TR/2000/WD-xmlquery-req-20000815
The new version does not yet take into account changes considered
below.

>     [31]XML Query Requirements
>
>      [31] http://www.w3.org/TR/xmlquery-req
>
>    Apologies for delay. Our comments will evolve as we see the evolution
>    of the spec and our understanding of it evolves; we would like to work
>    closely with the Query WG. Our current comments on XML Query
>    Requirements are:
>     a. The XML Query language should work equally well with any data or
>        query, regardless of the (human) languages or locales involved.
>        (This does not imply automatic translation!)

The XML Query WG has not yet come to a conlusive position on this
request.

>     b. IURI: the requirements must address the existence of URI
>        references containing non-ASCII characters according to the
>        Character Model.

The XML Query WG wonders about why the I18N-Group considers this
as a requirement for XML-Query. The XML-Query Working Group isn't
chartered for designing URIs. The generic syntax for URIs in
http://www.ietf.org/rfc/rfc2396.txt of the
IETF-group discusses this in Section 2, and leaves some
open issues. IURIs may also be regarded as an issue to be
dealt with by a URI-interest group proposed in
http://www.w3.org/2000/06/uri4324.

>     c. UCS evolution: the requirements must address the issue of UCS
>        evolution (in particular the addition of new characters).

The XML Query WG is not sure how to address UCS evolution as a requirement,
and would welcome further input from I18N on which changes to the
requirements document I18N is requesting in this context.

>     d. 3.3.2: please remove "1.0" from "XML 1.0 character data".

While evolvability of XML Query is clearly
desirable, XML Query 1.0 can not anticipate all possible
evolutions of standards it depends on. Therefore, the requirements
for XML Query 1.0 deliberately relate to existing versions of
standards.

>     e. 3.4.14 to 3.4.16: why "SHOULD" rather than "MUST"?

ad 3.4.14 Operations on Names:
This request has also been raised in other comments, and the XML Query
Working
Group agrees that simple operations on names (3.4.14) are a MUST.
Accordingly,
in the new version of the requirements document this requirement is
formulated as follows:

"3.4.x Operations on Names
Queries MUST be able to perform simple operations on names, such
as tests for equality in element names, attribute names, and processing
instruction targets, and to perform simple operations on combinations of
names and data. Queries MAY perform more powerful operations on names."

ad 3.4.15 Operations on Schemas:
The XML Query WG is still discussing a possible reformulation of this
requirement.

ad 3.4.16 Extensibility
Extensibility is a hard issue. Therefore, for XML Query
1.0, the XML Query WG has decided to treat extensibility as a SHOULD-
requirement.

>     f. 3.4.17: add "locale" and "time zone" to the list ("such as...").
>        Clarify what "in which the query is executed" means. In
>        particular, the relevant information generally relates to the
>        user. The locale and time zone of the server are generally
>        irrelevant to the results of the query.

As the reference to "user" indicates, the "environment in which the query
is executed" does not only refer to the server-environment but also to
user-environment. The XML Query WG agrees to add locale and time zone
to the examples of accessible environment information.

>     g. The semantics of queries should have a clear interpretation with
>        respect to locale. Make sure that all aspects of the language are
>        either locale-independent or that sufficient locale information is
>        contained in the query to make it unambiguous. A
>        locale-independent approach should be adopted wherever possible
>        (e.g. number format), localization being handled at the user end.
>        In some cases (such as a query of strings that collate between
>        "Smith" and "Thomas"), it is necessary to have certain locale
>        information ("in the Danish sorting order") be part of the query.
>     h. Ideally, it should be possible to transmit arbitrary collating
>        tailoring information with a query. (Cf. Unicode Technical Report
>        # 10 for details on collating and tailoring).
>     i. It may not possible for a processor to use collating information
>        based upon an arbitary tailoring or a specified locale (e.g. for
>        performance reasons or unavailability of the collating data for
>        the specified locale). In such a case, a query must not simply
>        return false results: it may decline the query or return results
>        according to another collating sequence, together with a warning
>        of that fact.

ad g.-i.:
The XML Query WG welcomes such explanations as possible input for its
work on datamodel, algebra, and syntax.

>     j. Section 4: strengthen the statement about the relationship between
>        our 2 groups.
>     k. Section 4: change "W3C goals for international access to the Web"
>        to "W3C goals for i18n".

ad j. and k.:
The XML Query WG agrees. In  a future version of the requirements
document the relationship to I18N will be described as follows:

"The XML Query Working Group will solicit feedback from
the Internationalization Working Group to ensure that it satisfies
W3C goals for international access to the Web."

>     l. The data model must account for inherited attributes (such as
>        xml:lang).
>     m. Query processors need to know about the structure of xml:lang. If
>        a query asks for a match of the string "chat" with xml:lang="fr",
>        the query should match data with xml:lang="fr-BE". Note: the
>        language tag spec, RFC 1766, is currently being extended (3-letter
>        language codes being introduced). The XML 1.0 spec has been
>        amended to take this into account.

The XML Query WG has not yet come to a conclusive position on this.

>     n. It is a goal of i18n that queries involving string matching
>        ("select x where x='some_constant'") treat canonically equivalent
>        strings (in the Unicode sense) as matching. If the query and the
>        target are both XML, early normalization (as per the Character
>        Model) is assumed and binary comparison ensures that the
>        equivalence requirement is satisfied. However, if the target is
>        originally a legacy database which logically has a layer that
>        exports the data as XML, that XML must be exported in normalized
>        form. The XML Query spec must impose the normalization requirement
>        upon such layers.
>     o. Similarly, the query may come from a user-interface layer that
>        creates the XML query. The XML Query spec must impose the
>        normalization requirement upon such layers.
>     p. Provided that the query and the target are in normalized form C,
>        the output of the query must itself be in normalized form C.
>     q. Queries involving string matching should support various kinds of
>        loose matching (such as case-insensitivity, katakana-hiragana
>        equivalence, accent-accentless equivalence, etc.)
>     r. If such features as case-insensitivity are present in queries
>        involving string matching, these features must be properly
>        internationalized (e.g. case folding works for accented letters)
>        and language-dependence must be taken into account (e.g. Turkish
>        dotless-i).
>     s. Queries involving character counting and indexing must take into
>        account the Character Model. Specifically, they should follow
>        Layer 3 (locale-independent graphemes). Additional details can be
>        found in The Unicode Standard 3.0 and UTR#18. Queries involving
>        word counting and indexing should similarly follow the
>        recommendations in these references.

ad l. through m.: The XML Query WG thanks I18N for this detailed
input on the treatment of string-datatypes and will take it into
consideration in the design of the data model, algebra, and syntax.
We have already added it to the issues list of the XML Algebra.
Received on Thursday, 24 August 2000 03:37:29 UTC