W3C home > Mailing lists > Public > public-i18n-ws@w3.org > May 2004

RE: sec 4.11

From: Addison Phillips [wM] <aphillips@webmethods.com>
Date: Mon, 10 May 2004 17:33:50 -0700
To: "Tex Texin" <tex@xencraft.com>, "Web Services" <public-i18n-ws@w3.org>
Message-ID: <PNEHIBAMBMLHDMJDDFLHEECNIAAA.aphillips@webmethods.com>

Hi Tex,

I reworked your material a little. Are you okay with:

<div2><head>Ordering, Grouping, and Collation</head>

<p>The ordering or collation of textual data items is a general concern for
internationalized software. The problem is exacerbated when the data can be
multilingual in nature. For Web services, in scenarios where the ordering of
textual data is critical to its correct utilization, it can be difficult to
identify the appropriate collation rules to use with sufficient precision to
insure those rules are either followed by any services that operate on the
data or that appropriate action is taken to compensate for any services that
do not use the desired collation rules. (For example, by re-sorting the data

<p>A brief list of these collation issues are described here. An important
reference is the  Unicode Collation Algorithm (UCA), described by: <bibref
ref="UTR10"/>. Although the UCA is a mature standard, it should be noted
that there is wide variance in the implementation of collation algorithms;
that few of these implementations are based on UCA; and that there is little
or no general agreement on identifiers for collation preferences.</p>

<p>Collation rules cannot be inferred solely from a language identifier or a
locale, as the identifiers do not indicate which sort ordering should be
used within a locale. A language identifier may be suggestive as to whether
a requester expects a particular sort ordering (as with  Traditional or a
Modern ordering in Spanish, for example) but it may not be definitive.</p>

<p>Some examples of sort orderings include: telephone, dictionary, phonetic,
binary, stroke-radical or radical-stroke. In the latter two cases, the
reference (source standard) for stroke count may also need to be cited.

<p>Different components or subsystems which are used by a software process
may employ different sort orderings. For example, a User Agent may provide a
drop-down list which sorts the elements of the list at run-time differently
from the other components of the agent. Information retrieved from a
database may be ordered by an index which has no correlation with the
requester's requirements. When different components or subsystems of a Web
Service use different collation rules, then errors can occur. They are not
always hard errors (i.e. those that generate faults) but the resulting data,
operations, or events, may be incorrect or be perceived to be incorrect by a
human observer.</p>

<p>In the case of services that might use a binary collation (ordering by
the code points of text data) there can be differences in ordering
introduced by different components using UTF-8 vs. UTF-16 internally.

<p>Knowing the language of the requester does not prescribe how sensitive
the collation should be. Should text elements that are different by case or
accent be treated as distinct? Should certain characters be ignored? For
example, hyphens are often ignored so that "e-mail" and "email" sort

<p>Where case is considered distinct, it may be important to describe
whether all lowercase characters precede all uppercase characters, vice
versa, or whether they should be intermixed.

<p>Often the performance of an application is impacted by collation. For
example, if a service returns results in an unknown ordering, the requester
may have to sort the results using its local collation rules. This can
consume resources and delay the further use of the results until the entire
set can be collated. Alternatively, if results are returned in the order
needed by the requester, then the requester can begin to process the first
records returned without waiting for the remaining records to arrive.

<p>Of course, collation can be performed at different stages of data
processing and timing can be an important consideration. Database indexes
are updated as the data is added to the database, not at the time a request
arrives. Requests that can use the preordained collation of the index have a
significant performance advantage over requests that either cannot use
indexes or must re-sort the results.

<p>See <xspecref href="#S-009">I-009</xspecref> and  <xspecref
href="#I-013">I-013</xspecref>for a some examples.</p>


Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force

Internationalization is an architecture.
It is not a feature.

> -----Original Message-----
> From: Tex Texin [mailto:tex@xencraft.com]
> Sent: Monday, May 10, 2004 4:44 PM
> To: Addison Phillips [wM]; Web Services
> Subject: sec 4.11
> attached
> --
> -------------------------------------------------------------
> Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> Xen Master                          http://www.i18nGuy.com
> XenCraft		            http://www.XenCraft.com
> Making e-Business Work Around the World
> -------------------------------------------------------------
Received on Monday, 10 May 2004 21:36:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:02:39 UTC