Re: LC-2023: transformation across character sets

I think this topic is a good example of something we have tripped across 
quite often, which is that we are not trying to specify an ideal content 
transformation proxy, we're trying to impose restrictions of what such a 
proxy should or should not (must and must not) do.

So, obviously, to my mind, a proxy that doesn't map character encodings 
when necessary in both the request and the response will be a useless 
proxy for some set of devices and servers. But that not in scope for us 
to comment about.

What's in scope is to speak about character encoding mappings that might 
be carried out that would do harm, probably, though then again, does 
this imply that we should have a clause that says "don't map character 
encodings to something the device does not support". Seems exceptionally 
banal and how many other statements of that form would we need to 
include for the sake of consistency?

In LC-2023 [1] the specific proposal is:

[1] 
http://www.w3.org/2006/02/lc-comments-tracker/37584/WD-ct-guidelines-20080801/2023?cid=2023

A simple way that could go some way towards alleviating this risk
would be to forbid any transformation if the server announces (either
via the HTTP field Content-type: charset=..., the XML declaration, or
a meta-tag) an encoding different from ASCII or perhaps UTF-8.

I don't think I understand the reasoning behind this. Though the idea of 
fiddling with Shift-JIS would fill me with dread and I guess that there 
is a perception that most mobile devices, even today, have better 
support for ISO 8859 than they do UTF-8. I'm not sure what truth there 
is to that perception, or even what "better" might mean.

To sum up, my feeling is that it would be useful to understand the 
benefits of what is proposed in LC-2023. Absent that understanding, my 
inclination is to "remain silent" on this topic like we decided before.

Jo
On 10/09/2008 00:23, Tom Hume wrote:
> 
> LC-2023 discusses problems around transformation of content across 
> character sets: mappings may not exist, documents may contain multiple 
> mappings, external entities may use alternative mappings and some 
> proprietary ones are well established (e.g. pictograms on I-Mode).
> 
> There have been posts on this list referring to character set 
> translation being a potentially useful service which a proxy might 
> provide, from mobile device towards origin server [1]. I've not seen any 
> discussion on character set translation carried out in the other direction.
> 
> I'm wondering how likely it is that a device will request content from a 
> server via a transforming proxy which the device is capable of parsing 
> itself, and the proxy transform this content in such a way that it is 
> not suitable for display on the device? This sounds like a Bad Thing :(
> 
> [1] http://lists.w3.org/Archives/Public/public-bpwg-ct/2008Mar/0021.html
> -- 
> Future Platforms Ltd
> e: Tom.Hume@futureplatforms.com
> t: +44 (0) 1273 819038
> m: +44 (0) 7971 781422
> company: www.futureplatforms.com
> personal: tomhume.org
> 
> 
> 
> 
> 

Received on Wednesday, 10 September 2008 10:03:38 UTC