[W3C] Best practices / charsets / fuzzy

Hello,

I find your (i.e. W3C group's) proposal to handle this 
issue a bit problematic.

> We resolved to add "character encoding" in the list of examples of 
> alterations a Content Transformation proxy should do with 
> much care in section 4.3.6.1: 

	"A proxy should strive for the best possible 
	user experience that the user agent supports."

This is vague ("should strive" -- why not "must strive"?
Are there any valid reasons not to strive?), "user experience" 
is undefined, no way of measuring it is provided (even a 
binary one distinguishing between acceptable and unacceptable
user experiences). When questioning the responsible for a 
CT deployment "Does your proxy strive for the best user 
experience?", do you expect any answer other than "Of course"?

	"It should only alter the format, layout, 
	dimensions etc. to match the specific 
	capabilities of the user agent."

Two problems here:

1. What is, and what is not hidden behind the "etc"? What
about these various elements, none of which is explicitly
mentioned:
	a) the behaviour induced by scripts
	b) the page size
	c) the character encoding
	d) the behaviour induced by keypad events and keypad
	   assignments (e.g. access keys)
Not mentioning these elements means that they are not
formally object of the guidelines, and hence negligible.

2. Matching the capabilities of the user agent is necessary,
but insufficient. As I highlighted with the example of forms,
the capabilities of the application server must be taken into
account as well -- we are talking about end-to-end systems
after all. Because of the way (X)HTML and WML handle the
encoding of form data sent back to the server, changing
the encoding of the document has an influence on what the
server experiences -- since the document encoding is second
in line as the applicable encoding to form data (after 
explicit statements in the "accept-charset" attribute of
the form element). The guidelines only consider the client,
to the exclusion of the server.
 
> This addresses the case when a page is served using an 
> encoding that is 
> compatible with the user agent. The Content Transformation 
> proxy should 
> not switch the encoding to something else in that case, especially 
> since, as you point out, it is unlikely that the mapping 
> works for all 
> characters. That's what the guideline says.

Not quite. The CTG does not prevent the transcoder
to change the encoding to another one that is also
supported by the client. Notice that different
charsets may have different q-values, and hence
a change of encoding, even if it supported, may 
entail a degradation of the "user experience" -- 
but this cannot be established, since increases 
or decreases of user experience remain undefined 
and hence unmeasurable.

> That the mapping may not work in all cases and/or in 
> both directions is not a real problem IMO. 

This is where we disagree -- effectively we disagree
on the two-way scenario. I argue that form-filling
scenarios encompass enough applications where faithful 
transmission of user input is indispensable either
because of security (on-line orders, banking, e-mail) 
or usability reasons (directory queries, timetables),
that one must provide precise guidelines to at least 
ensure a minimum of consistency. 

The W3C position can be reduced to the statement 
that actually no practice need be enforced, since
users know what is taking place. That is a position 
-- but it must be made explicit as such and 
substantiated in the CTG.

> The Content Transformation proxy may add a warning at the top of the 
> page along the lines of "Some characters can't be displayed on your 
> phone" or "The form below is unlikely to work". It would not work 
> reliably. But working a bit may be considered better than not 
> working at 
> all. 

"may" -- meaning there is no obligation to inform the 
end user about the consequences of transcoding. The CTG
provides for informing about a choice of representations,
not about the consequences thereof.

> In any case, this is out of scope of these guidelines: 
> we are not 
> trying to define the nature of the restructuring operations that may 
> occur, but rather to define a few mechanisms by which content 
> providers, 
> content transformation proxies and end users may communicate 
> with each 
> other.

This is probably impossible without a specific, new protocol
(on top of HTTP) -- and in fact you have a half-baked one already
with the x-device-* HTTP fields. Attempts to match existing HTTP
protocol entities to CTG needs never quite work out (no-transform
does not work for WML; vary does not work for sites that produce
lcd content; via does not work since it is optional and even its
content need not be directly recognizable by servers). To make
it really work, you must either stipulate strong, unambiguous 
heuristics to complement the protocol elements (e.g. DOCTYPE, 
MIME types, domain names) or eventually do what was done with, 
notably, uaprof -- i.e. define an own, self-contained protocol 
with the necessary features.

> See: http://www.w3.org/2008/09/30-bpwg-minutes.html#item05 where this 
> was discussed.

These minutes were a bit confusing -- they discussed another 
topic with charsets at the same time. Anyway, thanks for the 
link!

> I wrote that but I kind of disagree with myself here ;)
> For the same reason as above, I don't think we should be that 
> restrictive.

That may be so, but I hope to have made the rationale
behind these restrictions clear. On the other hand, the
CTG specify no restrictions whatsoever regarding encodings
at this point -- "do for the best and be careful" cannot
be construed as a guideline, which by definition must 
restrict the set of possible behaviours to a more or 
large, but identifiable class of what is acceptable -- 
and that is where "best practices" would come into play.


Note: Is it really necessary to address the e-mail
personally to you, or does everybody receive the
messages through public-bpwg-ct@w3.org?

Cheers

E. Casais

Received on Wednesday, 29 October 2008 13:35:46 UTC