Internationalization Comments on CC/PP

Dear CC/PP Working Group,

Below please find some Last Call comments regarding internationalization
on http://www.w3.org/TR/2001/WD-CCPP-struct-vocab-20010315/.

Please note that these comments have not been approved yet by the
Internationalization Working Group. We have a teleconference
tomorrow where we will check whether there are any other comments.

For the Internationalization WG/IG: Please note that the www-mobile
mailing list is public.


- Please use the xml:lang attribute wherever appropriate:
   - in particular for the schemas in App. B.3 and App. C, which
     have full English text in element content
   - Also for all the examples that have English comments
   - Possibly also for the other examples, where English words are
     used as simple tokens. This is important for accessibility.


- Appendix C, 'charWidth': "For proportional font displays, this is
   the width of the display in ems (where an em is the typographical
   unit that is the width of an em-dash/letter M).":
   This creates some problems, because an em in a non-proportional
   font is much wider than one in a proportional font. As an example,
   CSS goes as far as simply defining that an em be equal to the
   font size (i.e. assuming a square letter M).
   (see http://www.w3.org/TR/REC-CSS2/syndata.html#length-units)
   The difference is most clear in the half-/full-width display
   convention used widely in East Asia: Ideographs and a few other
   characters are given a full/double display cell width, whereas
   the typical ASCII set is given a half/single display cell width.
   It is unclear whether these cases should be considered as
   proportional or as non-proportional, and what the actual counting
   unit should be.

   Proposals:
   - To change the definition for proportional fonts to something that
     is closer to the equivalent in non-proportional fonts (e.g. 'en',...)
   - To clearly decide the issue for the typical East Asian half/full-width
     case, and add warnings.


- 4.1.1.1 URI values: RDF contains the following:
   (http://www.w3.org/TR/REC-rdf-syntax/#grammar)

   Note: Although non-ASCII characters in URIs are not allowed by [URI],
   [XML] specifies a convention to avoid unnecessary incompatibilities in
   extended URI syntax. Implementors of RDF are encouraged to avoid further
   incompatibility and use the XML convention for system identifiers. Namely,
   that a non-ASCII character in a URI be represented in UTF-8 as one or more
   bytes, and then these bytes be escaped with the URI escaping mechanism
   (i.e., by converting each byte to %HH, where HH is the hexadecimal notation
    of the byte value).

   CC/PP should add a similar note, updated to be in line with the wording
   in XML 1.02e (including errata), XLink, XPointer, and XML Schema.


- 4.1.2.1 Set of values:

   For the Accept-Language header in HTTP, in theory, q-values
   are needed. In actual practice, the order is significant.
   A set of values does not allow to model this behaviour.
   Is there another CC/PP construct that allows to model this?
   If not, it should be introduced.


- App. F.1: Interaction with http has to discuss how http Accept-headers
   and CC/PP interact. This is relevant for internationalization because
   the Accept-Language and Accept-Charset headers are already used in
   practice. Introducing another way to express and send such information
   will lead to various problems:
   - This information is important for internationalization, but too
     often neglected. The less ways there are to express it, the easier
     it is to convince people that (and how) to use it.
   - If information is present twice, but the two instances are differnt,
     which one is correct? Which one should have priority? Or should
     the request be rejected?
   - If there are two ways to express the same thing, but there are
     slight differences in the expressibility, automatic conversion
     may not be possible. Utmost care is necessary to avoid differences.

   Parts of the considerations in this item should also be incorporated
   into Appendix D.


- App. C, 'charset': The definition says: "For a text display device,
   a character set that can be rendered.": This is highly inappropriate.
   First, character encoding is highly relevant for any kind of processing,
   not only for display. Second, the term 'character set' is misleading.
   Using 'character encoding' will lead to less confusion.


Regards,    Martin.

#-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst

Received on Monday, 9 April 2001 08:41:06 UTC