Re: [W3C] Best practices / charsets

Thanks for the input, Eduardo.

See below for comments. wrote:
> To the W3C Mobile Web BPWG.
> These are answers to the points raised in messages about 
> charsets (LC-2023)
> I. UTF-8, etc.
> Question:
> What's in scope is to speak about character encoding 
> mappings that might be carried out that would do harm, 
> probably, though then again, does this imply that we 
> should have a clause that says "don't map character 
> encodings to something the device does not support". 
> Seems exceptionally banal and how many other statements 
> of that form would we need to include for the sake of 
> consistency?
> Answer:
> It is important to consider both the flow of information
> from server to client and vice-versa (e.g. forms).
> That explains why I propose to restrict safe transcodings
> to UTF-* and UCS-4 (in this sense, LC-2023 was unclear, it
> only talked about UTF-8). 

I think your analysis is pretty good.
I don't fully agree with the conclusion though.

We resolved to add "character encoding" in the list of examples of 
alterations a Content Transformation proxy should do with much care in 
section "It should only alter the format, layout, dimensions 
etc. to match the specific capabilities of the user agent".

This addresses the case when a page is served using an encoding that is 
compatible with the user agent. The Content Transformation proxy should 
not switch the encoding to something else in that case, especially 
since, as you point out, it is unlikely that the mapping works for all 
characters. That's what the guideline says.

When a page is served with an encoding that is not compatible with the 
user agent, the end user simply cannot display the page. Here a Content 
Transformation Proxy may serve as an enabler. That the mapping may not 
work in all cases and/or in both directions is not a real problem IMO. 
The Content Transformation proxy may add a warning at the top of the 
page along the lines of "Some characters can't be displayed on your 
phone" or "The form below is unlikely to work". It would not work 
reliably. But working a bit may be considered better than not working at 
all. In any case, this is out of scope of these guidelines: we are not 
trying to define the nature of the restructuring operations that may 
occur, but rather to define a few mechanisms by which content providers, 
content transformation proxies and end users may communicate with each 

See: where this 
was discussed.


> II. Client capabilities.
> Question:
> Between the CT-proxy and the end-user, the text we 
> already have in summarizes the point "make 
> sure that if you transform, you create content that 
> can be rendered by the end-user's device".
> Answer:
> This only considers Web browsing as a unidirectional
> flow of information -- but there are forms. Hence, one
> must consider both transcoding directions, and this is
> no longer an issue of presentation -- it is an issue
> of input to back-end systems (e.g. databases). This is
> all the more difficult that there are few mechanisms
> for servers to advertise what charsets they really are
> able to handle, and those (such as "accept-charset" in
> forms) and their interactions with other parts of the
> transcoding process are not elaborated upon in the 
> guidelines.

Again, I agree that it is, in short, a mess.
I don't think it really becomes an issue of input to back-end systems 
though: servers must not rely on the data they receive. In my view, it's 
simply an issue of working/not-working.

> This point is actually acknowledged later as follows:
> This would also address the "bijective" part of the 
> comment: do not transform the content encoding of a 
> page that contains a form, unless you can ensure that 
> you can convert the user's input back to the content 
> encoding expected by the server.

I wrote that but I kind of disagree with myself here ;)
For the same reason as above, I don't think we should be that restrictive.


Received on Wednesday, 29 October 2008 10:40:43 UTC