- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 09 Apr 2001 17:33:42 +0900
- To: www-mobile@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear CC/PP Working Group, Below please find some Last Call comments regarding internationalization on http://www.w3.org/TR/2001/WD-CCPP-struct-vocab-20010315/. Please note that these comments have not been approved yet by the Internationalization Working Group. We have a teleconference tomorrow where we will check whether there are any other comments. For the Internationalization WG/IG: Please note that the www-mobile mailing list is public. - Please use the xml:lang attribute wherever appropriate: - in particular for the schemas in App. B.3 and App. C, which have full English text in element content - Also for all the examples that have English comments - Possibly also for the other examples, where English words are used as simple tokens. This is important for accessibility. - Appendix C, 'charWidth': "For proportional font displays, this is the width of the display in ems (where an em is the typographical unit that is the width of an em-dash/letter M).": This creates some problems, because an em in a non-proportional font is much wider than one in a proportional font. As an example, CSS goes as far as simply defining that an em be equal to the font size (i.e. assuming a square letter M). (see http://www.w3.org/TR/REC-CSS2/syndata.html#length-units) The difference is most clear in the half-/full-width display convention used widely in East Asia: Ideographs and a few other characters are given a full/double display cell width, whereas the typical ASCII set is given a half/single display cell width. It is unclear whether these cases should be considered as proportional or as non-proportional, and what the actual counting unit should be. Proposals: - To change the definition for proportional fonts to something that is closer to the equivalent in non-proportional fonts (e.g. 'en',...) - To clearly decide the issue for the typical East Asian half/full-width case, and add warnings. - 4.1.1.1 URI values: RDF contains the following: (http://www.w3.org/TR/REC-rdf-syntax/#grammar) Note: Although non-ASCII characters in URIs are not allowed by [URI], [XML] specifies a convention to avoid unnecessary incompatibilities in extended URI syntax. Implementors of RDF are encouraged to avoid further incompatibility and use the XML convention for system identifiers. Namely, that a non-ASCII character in a URI be represented in UTF-8 as one or more bytes, and then these bytes be escaped with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value). CC/PP should add a similar note, updated to be in line with the wording in XML 1.02e (including errata), XLink, XPointer, and XML Schema. - 4.1.2.1 Set of values: For the Accept-Language header in HTTP, in theory, q-values are needed. In actual practice, the order is significant. A set of values does not allow to model this behaviour. Is there another CC/PP construct that allows to model this? If not, it should be introduced. - App. F.1: Interaction with http has to discuss how http Accept-headers and CC/PP interact. This is relevant for internationalization because the Accept-Language and Accept-Charset headers are already used in practice. Introducing another way to express and send such information will lead to various problems: - This information is important for internationalization, but too often neglected. The less ways there are to express it, the easier it is to convince people that (and how) to use it. - If information is present twice, but the two instances are differnt, which one is correct? Which one should have priority? Or should the request be rejected? - If there are two ways to express the same thing, but there are slight differences in the expressibility, automatic conversion may not be possible. Utmost care is necessary to avoid differences. Parts of the considerations in this item should also be incorporated into Appendix D. - App. C, 'charset': The definition says: "For a text display device, a character set that can be rendered.": This is highly inappropriate. First, character encoding is highly relevant for any kind of processing, not only for display. Second, the term 'character set' is misleading. Using 'character encoding' will lead to less confusion. Regards, Martin. #-#-# Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium #-#-# mailto:duerst@w3.org http://www.w3.org/People/D%C3%BCrst
Received on Monday, 9 April 2001 08:41:06 UTC