- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 09 Apr 2001 17:33:42 +0900
- To: www-mobile@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear CC/PP Working Group,
Below please find some Last Call comments regarding internationalization
on http://www.w3.org/TR/2001/WD-CCPP-struct-vocab-20010315/.
Please note that these comments have not been approved yet by the
Internationalization Working Group. We have a teleconference
tomorrow where we will check whether there are any other comments.
For the Internationalization WG/IG: Please note that the www-mobile
mailing list is public.
- Please use the xml:lang attribute wherever appropriate:
- in particular for the schemas in App. B.3 and App. C, which
have full English text in element content
- Also for all the examples that have English comments
- Possibly also for the other examples, where English words are
used as simple tokens. This is important for accessibility.
- Appendix C, 'charWidth': "For proportional font displays, this is
the width of the display in ems (where an em is the typographical
unit that is the width of an em-dash/letter M).":
This creates some problems, because an em in a non-proportional
font is much wider than one in a proportional font. As an example,
CSS goes as far as simply defining that an em be equal to the
font size (i.e. assuming a square letter M).
(see http://www.w3.org/TR/REC-CSS2/syndata.html#length-units)
The difference is most clear in the half-/full-width display
convention used widely in East Asia: Ideographs and a few other
characters are given a full/double display cell width, whereas
the typical ASCII set is given a half/single display cell width.
It is unclear whether these cases should be considered as
proportional or as non-proportional, and what the actual counting
unit should be.
Proposals:
- To change the definition for proportional fonts to something that
is closer to the equivalent in non-proportional fonts (e.g. 'en',...)
- To clearly decide the issue for the typical East Asian half/full-width
case, and add warnings.
- 4.1.1.1 URI values: RDF contains the following:
(http://www.w3.org/TR/REC-rdf-syntax/#grammar)
Note: Although non-ASCII characters in URIs are not allowed by [URI],
[XML] specifies a convention to avoid unnecessary incompatibilities in
extended URI syntax. Implementors of RDF are encouraged to avoid further
incompatibility and use the XML convention for system identifiers. Namely,
that a non-ASCII character in a URI be represented in UTF-8 as one or more
bytes, and then these bytes be escaped with the URI escaping mechanism
(i.e., by converting each byte to %HH, where HH is the hexadecimal notation
of the byte value).
CC/PP should add a similar note, updated to be in line with the wording
in XML 1.02e (including errata), XLink, XPointer, and XML Schema.
- 4.1.2.1 Set of values:
For the Accept-Language header in HTTP, in theory, q-values
are needed. In actual practice, the order is significant.
A set of values does not allow to model this behaviour.
Is there another CC/PP construct that allows to model this?
If not, it should be introduced.
- App. F.1: Interaction with http has to discuss how http Accept-headers
and CC/PP interact. This is relevant for internationalization because
the Accept-Language and Accept-Charset headers are already used in
practice. Introducing another way to express and send such information
will lead to various problems:
- This information is important for internationalization, but too
often neglected. The less ways there are to express it, the easier
it is to convince people that (and how) to use it.
- If information is present twice, but the two instances are differnt,
which one is correct? Which one should have priority? Or should
the request be rejected?
- If there are two ways to express the same thing, but there are
slight differences in the expressibility, automatic conversion
may not be possible. Utmost care is necessary to avoid differences.
Parts of the considerations in this item should also be incorporated
into Appendix D.
- App. C, 'charset': The definition says: "For a text display device,
a character set that can be rendered.": This is highly inappropriate.
First, character encoding is highly relevant for any kind of processing,
not only for display. Second, the term 'character set' is misleading.
Using 'character encoding' will lead to less confusion.
Regards, Martin.
#-#-# Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
#-#-# mailto:duerst@w3.org http://www.w3.org/People/D%C3%BCrst
Received on Monday, 9 April 2001 08:41:06 UTC