Re: LC-2023: transformation across character sets from Tom Hume on 2008-09-21 (public-bpwg-ct@w3.org from September 2008)

From: Tom Hume <Tom.Hume@futureplatforms.com>
Date: Sun, 21 Sep 2008 23:11:03 +0100
To: Jo Rabin <jrabin@mtld.mobi>
Cc: public-bpwg-ct <public-bpwg-ct@w3.org>
Message-Id: <3B30E975-9839-4121-9454-52C457A1921D@futureplatforms.com>

I'm not sure how reasonable it is to forbid transcoding of character  
sets other than UTF-8/ASCII either.

On 10 Sep 2008, at 11:02, Jo Rabin wrote:

> I think this topic is a good example of something we have tripped  
> across quite often, which is that we are not trying to specify an  
> ideal content transformation proxy, we're trying to impose  
> restrictions of what such a proxy should or should not (must and  
> must not) do.
>
> So, obviously, to my mind, a proxy that doesn't map character  
> encodings when necessary in both the request and the response will  
> be a useless proxy for some set of devices and servers. But that not  
> in scope for us to comment about.
>
> What's in scope is to speak about character encoding mappings that  
> might be carried out that would do harm, probably, though then  
> again, does this imply that we should have a clause that says "don't  
> map character encodings to something the device does not support".  
> Seems exceptionally banal and how many other statements of that form  
> would we need to include for the sake of consistency?
>
> In LC-2023 [1] the specific proposal is:
>
> [1] http://www.w3.org/2006/02/lc-comments-tracker/37584/WD-ct-guidelines-20080801/2023?cid=2023
>
> A simple way that could go some way towards alleviating this risk
> would be to forbid any transformation if the server announces (either
> via the HTTP field Content-type: charset=..., the XML declaration, or
> a meta-tag) an encoding different from ASCII or perhaps UTF-8.
>
> I don't think I understand the reasoning behind this. Though the  
> idea of fiddling with Shift-JIS would fill me with dread and I guess  
> that there is a perception that most mobile devices, even today,  
> have better support for ISO 8859 than they do UTF-8. I'm not sure  
> what truth there is to that perception, or even what "better" might  
> mean.
>
> To sum up, my feeling is that it would be useful to understand the  
> benefits of what is proposed in LC-2023. Absent that understanding,  
> my inclination is to "remain silent" on this topic like we decided  
> before.
>
> Jo
> On 10/09/2008 00:23, Tom Hume wrote:
>> LC-2023 discusses problems around transformation of content across  
>> character sets: mappings may not exist, documents may contain  
>> multiple mappings, external entities may use alternative mappings  
>> and some proprietary ones are well established (e.g. pictograms on  
>> I-Mode).
>> There have been posts on this list referring to character set  
>> translation being a potentially useful service which a proxy might  
>> provide, from mobile device towards origin server [1]. I've not  
>> seen any discussion on character set translation carried out in the  
>> other direction.
>> I'm wondering how likely it is that a device will request content  
>> from a server via a transforming proxy which the device is capable  
>> of parsing itself, and the proxy transform this content in such a  
>> way that it is not suitable for display on the device? This sounds  
>> like a Bad Thing :(
>> [1] http://lists.w3.org/Archives/Public/public-bpwg-ct/2008Mar/0021.html
>> -- 
>> Future Platforms Ltd
>> e: Tom.Hume@futureplatforms.com
>> t: +44 (0) 1273 819038
>> m: +44 (0) 7971 781422
>> company: www.futureplatforms.com
>> personal: tomhume.org
>

--
Future Platforms Ltd
e: Tom.Hume@futureplatforms.com
t: +44 (0) 1273 819038
m: +44 (0) 7971 781422
company: www.futureplatforms.com
personal: tomhume.org

Received on Sunday, 21 September 2008 22:11:41 UTC