Re: LC-2023: transformation across character sets from Francois Daoust on 2008-09-10 (public-bpwg-ct@w3.org from September 2008)

From: Francois Daoust <fd@w3.org>
Date: Wed, 10 Sep 2008 14:22:02 +0200
To: Jo Rabin <jrabin@mtld.mobi>
CC: Tom Hume <Tom.Hume@futureplatforms.com>, public-bpwg-ct <public-bpwg-ct@w3.org>
Message-ID: <48C7BBEA.4030602@w3.org>
Yes.

Encoding may be a mine-field, but that's not a reason to forbid it.

There is an unlimited list of things that may come in mind that content 
transformation proxies should handle with care when transforming content 
but that's out of scope of these guidelines, targeted as ways to 
"inter-work when delivering Web content".


- Between the CT-proxy and the end-user, the text we already have in 
4.3.6.1 summarizes the point "make sure that if you transform, you 
create content that can be rendered by the end-user's device":

"A proxy should strive for the best possible user experience that the 
user agent supports. It should  only alter the format, layout, 
dimensions etc. to match the specific capabilities of the user agent."

We may include "Encoding" in the list of examples.
We may also add "A proxy should not apply transformation if it is 
unaware of the capabilities of the user agent", but that sounds fairly 
trivial as well.


- Between the CT-proxy and the origin server, I remember we discussed 
the fact that things should look "normal" from the origin server's point 
of view. Again, that seemed so obvious we decided against writing 
anything down in the spec.

I wonder if we shouldn't add a generic statement on this though, 
something that states that CT-proxies must make sure that the origin 
servers receive the subsequent requests (e.g. a POST request once a form 
has been filled out) as expected and as if content transformation had 
not existed (apart from the possible modifications of a few HTTP headers 
values, that is).

Trivial statements may have some importance, because if a CT-proxy 
breaks content in such obvious manners, that may be a bug and its 
problem, but we also probably do not want it to claim that it conforms 
to the spec.

This would also address the "bijective" part of the comment: do not 
transform the content encoding of a page that contains a form, unless 
you can ensure that you can convert the user's input back to the content 
encoding expected by the server.


We may want to get back to Eduardo to see if we've missed anything on this.

Francois.


Jo Rabin wrote:
> 
> I think this topic is a good example of something we have tripped across 
> quite often, which is that we are not trying to specify an ideal content 
> transformation proxy, we're trying to impose restrictions of what such a 
> proxy should or should not (must and must not) do.
> 
> So, obviously, to my mind, a proxy that doesn't map character encodings 
> when necessary in both the request and the response will be a useless 
> proxy for some set of devices and servers. But that not in scope for us 
> to comment about.
> 
> What's in scope is to speak about character encoding mappings that might 
> be carried out that would do harm, probably, though then again, does 
> this imply that we should have a clause that says "don't map character 
> encodings to something the device does not support". Seems exceptionally 
> banal and how many other statements of that form would we need to 
> include for the sake of consistency?
> 
> In LC-2023 [1] the specific proposal is:
> 
> [1] 
> http://www.w3.org/2006/02/lc-comments-tracker/37584/WD-ct-guidelines-20080801/2023?cid=2023 
> 
> 
> A simple way that could go some way towards alleviating this risk
> would be to forbid any transformation if the server announces (either
> via the HTTP field Content-type: charset=..., the XML declaration, or
> a meta-tag) an encoding different from ASCII or perhaps UTF-8.
> 
> I don't think I understand the reasoning behind this. Though the idea of 
> fiddling with Shift-JIS would fill me with dread and I guess that there 
> is a perception that most mobile devices, even today, have better 
> support for ISO 8859 than they do UTF-8. I'm not sure what truth there 
> is to that perception, or even what "better" might mean.
> 
> To sum up, my feeling is that it would be useful to understand the 
> benefits of what is proposed in LC-2023. Absent that understanding, my 
> inclination is to "remain silent" on this topic like we decided before.
> 
> Jo
> On 10/09/2008 00:23, Tom Hume wrote:
>>
>> LC-2023 discusses problems around transformation of content across 
>> character sets: mappings may not exist, documents may contain multiple 
>> mappings, external entities may use alternative mappings and some 
>> proprietary ones are well established (e.g. pictograms on I-Mode).
>>
>> There have been posts on this list referring to character set 
>> translation being a potentially useful service which a proxy might 
>> provide, from mobile device towards origin server [1]. I've not seen 
>> any discussion on character set translation carried out in the other 
>> direction.
>>
>> I'm wondering how likely it is that a device will request content from 
>> a server via a transforming proxy which the device is capable of 
>> parsing itself, and the proxy transform this content in such a way 
>> that it is not suitable for display on the device? This sounds like a 
>> Bad Thing :(
>>
>> [1] http://lists.w3.org/Archives/Public/public-bpwg-ct/2008Mar/0021.html
>> -- 
>> Future Platforms Ltd
>> e: Tom.Hume@futureplatforms.com
>> t: +44 (0) 1273 819038
>> m: +44 (0) 7971 781422
>> company: www.futureplatforms.com
>> personal: tomhume.org
>>
>>
>>
>>
>>
> 
>
Received on Wednesday, 10 September 2008 12:22:41 UTC