W3C home > Mailing lists > Public > public-i18n-ws@w3.org > May 2004

Re: sec 4.11

From: A. Vine <andrea.vine@Sun.COM>
Date: Tue, 11 May 2004 16:27:40 -0700
To: I18n WSTF <public-i18n-ws@w3.org>
Message-id: <40A1616C.5030304@sun.com>

editorial - inline

Addison Phillips [wM] wrote:

> Hi Tex,
> 
> I reworked your material a little. Are you okay with:
> 
> ------
> <div2><head>Ordering, Grouping, and Collation</head>
> 
> <p>The ordering or collation of textual data items is a general concern for
> internationalized software. The problem is exacerbated when the data can be
> multilingual in nature. 

The prior sentence is tough to grok for ESL folks, how about -
"It is even more difficult when the data is multilingual."

> For Web services, in scenarios where the ordering of
> textual data is critical to its correct utilization, it can be difficult to
> identify the appropriate collation rules to use with sufficient precision to
> insure those rules are either followed by any services that operate on the
> data or that appropriate action is taken to compensate for any services that
> do not use the desired collation rules. (For example, by re-sorting the data
> downstream).

Er, maybe -
"For Web services, in scenarios where the ordering of textual data is critical 
to its correct utilization, it can be difficult to identify the collation rules 
to use with enough precision to ensure those rules are either a) followed by any 
services that operate on the data, or b) that appropriate action is taken to 
compensate for any services that do not use the desired collation rules (for 
example, by re-sorting the data downstream)."

But if anyone can think of a way to further break up that very looonnnnnng 
sentence, so much the better.

> </p>
> 
> <p>A brief list of these collation issues are described here. An important
> reference is the  Unicode Collation Algorithm (UCA), described by: <bibref
> ref="UTR10"/>. Although the UCA is a mature standard, it should be noted
> that there is wide variance in the implementation of collation algorithms;
> that few of these implementations are based on UCA; and that there is little
> or no general agreement on identifiers for collation preferences.</p>

Are the semi-colons syntactically necessary?  or should they be commas?

> 
> <p>Collation rules cannot be inferred solely from a language identifier or a
> locale, as the identifiers do not indicate which sort ordering should be
> used within a locale. A language identifier may be suggestive as to whether
> a requester expects a particular sort ordering (as with  Traditional or a
> Modern ordering in Spanish, for example) but it may not be definitive.</p>
> 
> <p>Some examples of sort orderings include: telephone, dictionary, phonetic,
> binary, stroke-radical or radical-stroke. In the latter two cases, the
> reference (source standard) for stroke count may also need to be cited.
> </p>

remove the parens around "source standard" and use the word "or" instead

> 
> <p>Different components or subsystems which are used by a software process
> may employ different sort orderings. For example, a User Agent may provide a
> drop-down list which sorts the elements of the list at run-time differently
> from the other components of the agent.

As long as you're giving an example, follow it through -
"For example, a User Agent may provide a drop-down list which sorts the elements 
  in telephone book order at run-time while the data was retrieved from a 
database in dictionary order."

or something along those lines.  Not sure what to retain from the following 
sentence.

> Information retrieved from a
> database may be ordered by an index which has no correlation with the
> requester's requirements. When different components or subsystems of a Web
> Service use different collation rules, then errors can occur. They are not

"When different components or subsystems of a Web service use different 
collation rules, errors can occur."


> always hard errors (i.e. those that generate faults) but the resulting data,
> operations, or events, may be incorrect or be perceived to be incorrect by a
> human observer.</p>
> 
> <p>In the case of services that might use a binary collation (ordering by
> the code points of text data) there can be differences in ordering
> introduced by different components using UTF-8 vs. UTF-16 internally.
> </p>
> 
> <p>Knowing the language of the requester does not prescribe how sensitive
> the collation should be. Should text elements that are different by case or
> accent be treated as distinct? Should certain characters be ignored? For
> example, hyphens are often ignored so that "e-mail" and "email" sort
> together.
> </p>
> 
> <p>Where case is considered distinct, it may be important to describe
> whether all lowercase characters precede all uppercase characters, vice
> versa, or whether they should be intermixed.
> </p>
> 
> <p>Often the performance of an application is impacted by collation. For
> example, if a service returns results in an unknown ordering, the requester
> may have to sort the results using its local collation rules. This can
> consume resources and delay the further use of the results until the entire
> set can be collated.

" ... and delay further use ... " (removed 'the')

> Alternatively, if results are returned in the order
> needed by the requester, then the requester can begin to process the first
> records returned without waiting for the remaining records to arrive.
> </p>
> 
> <p>Of course, collation can be performed at different stages of data
> processing and timing can be an important consideration. Database indexes
> are updated as the data is added to the database, not at the time a request
> arrives. Requests that can use the preordained collation of the index have a
> significant performance advantage over requests that either cannot use
> indexes or must re-sort the results.
> </p>
> 
> <p>See <xspecref href="#S-009">I-009</xspecref> and  <xspecref
> href="#I-013">I-013</xspecref>for a some examples.</p>
> </div2>
> 
> ------
> 
> Addison P. Phillips
Andrea Vine

> Director, Globalization Architecture
Peon, Internationalization Architect

> webMethods | Delivering Global Business Visibility
Sun | The Network is the Computer

> http://www.webMethods.com
http://www.sun.com

> Chair, W3C Internationalization (I18N) Working Group
Captain, Quarters team

> Chair, W3C-I18N-WG, Web Services Task Force
Member, "

> http://www.w3.org/International
http://example.org

> Internationalization is an architecture.
> It is not a feature.
--
I have always wished that my computer would be as easy to use as my telephone. 
My wish has come true. I no longer know how to use my telephone.
-Bjarne Stroustrup, designer of C++ programming language (1950- )
Received on Tuesday, 11 May 2004 19:14:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:53 GMT