W3C home > Mailing lists > Public > www-i18n-comments@w3.org > October 2004

Your comments on Character Model Fundamentals [LC003, LC004, LC005, LC006]

From: Richard Ishida <ishida@w3.org>
Date: Wed, 6 Oct 2004 11:38:30 +0100
To: <markus.scherer@jtcsv.com>
Cc: <www-i18n-comments@w3.org>
Message-Id: <20041006103829.D6B994EFA5@homer.w3.org>

Dear Markus,

Many thanks for your comments on the 3rd Last Call version of the Character Model for the World Wide Web 1.0: Fundamentals [1].  We appreciate the interest you have taken in this specification.

You can see the comments you submitted, grouped together, at http://www.w3.org/International/Group/2004/charmod1-lc/SortByOriginator.html#LC003
(You can jump to a specific comment in the table by adding its ID to the end of the URI.)

The following comments were noted and deferred. If you wish to say that you are satisfied or raise an issue, please reply to us within the next two weeks at mailto:www-i18n-comments@w3.org and copy w3c-i18n-ig@w3.org.
        LC003, LC004, LC005

PLEASE REVIEW the decision for the following additional comment and reply to us within the next two weeks at mailto:www-i18n-comments@w3.org (copying
w3c-i18n-ig@w3.org) to say whether you are satisfied with the decision taken. 

Information relating to these comments is included below.

These comments relate to the editor's version at http://www.w3.org/International/Group/charmod-edit/charmod1.html

Best regards,
Richard Ishida, for the I18N WG


Decision: Rejected The 'character string' provides a good balance between user requirements (ideally count in terms of grapheme clusters) and implementation requirements (count in terms of code units). Also, it takes into account that specifications (in particular those related to XML) are written in terms of characters, not code units.

We would like to point out that we have carefully listed the alternatives and the reasons for when to use them in C052 and C071,..., so that readers of the Character Model (writers of specifications) should be able to make the best decision on their own.

Although we understand performance concerns about calculating string length, we haven't heard any complaints about this e.g. from implementers of XSLT. Also, in cases where it should really become a bottleneck, e.g. finding a certain character position in an extremely long string encoded in UTF-16 (or for that matter e.g. in UTF-8), there are techniques for optimization (e.g. building an index of every 1000'th character position for an 1M long string, to be used for speedup of subsequent indexing operations).

Also, strings in general are not as easy to use as they may seem. For some interesting background, please see http://www.joelonsoftware.com/articles/fog0000000319.html.

[1] The version of CharMod you commented on: 
[2] Latest editor's version (still being edited): 
[3] Last Call comments table, sorted by ID: 
Received on Wednesday, 6 October 2004 10:38:31 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:20:15 UTC