- From: <eduardo.casais@areppim.com>
- Date: Wed, 29 Oct 2008 23:21:32 +0100
- To: "'Francois Daoust'" <fd@w3.org>
- Cc: <public-bpwg-ct@w3.org>, <Tom.Hume@futureplatforms.com>
Let us cut a lot of cited text... > It is vague. User experience is vague in essence. > Can you think of a better proposal? One difficulty I see is that two aspects are mixed in this notion of "user experience": 1. Ergonomics -- i.e. the fact that the choice of fonts and colours, the placement of links, the size of images, the length of pages, etc, makes navigation, reading and typing hassle-free. This is essential, but I am not a usability specialist, though I know one can quantify various aspects of usability, and there are even best practices as to these various aspects. 2. Feature support -- i.e. the fact that the character sets, the colour space, the dimensions of images, the weight of pages, etc, are suitable for the terminal. This is much more tractable as to formalization. > I agree that the use of a normative statement here is clumsy, > at best. > The goal is to have some kind of recognition that one does not expect > e.g. a Content-Transformation proxy to split pages in 1KB chunks when > the user agent is a high end smartphone-like mobile device, > which would > affect user experience. This is primarily about technology, so these issues are largely formalizable. Here is a sketch: Let us consider the terminal capabilities (termcap), as defined by: a) the information sent in HTTP accept-*, user-agent and related fields (mainly for IE devices); b) the information contained in an attached user agent profile, when published by the device via x-wap-profile; c) additional information from the transcoder operator, representable in a schema compatible with (b), when available. In order to maximize user experience when transforming content, transcoders make sure that the characteristics of the output sent to the device, described in the same attribute space as termcap, respect the following properties: 1. If the characteristic maps to a set attribute in the termcap, then the value of the characteristic corresponds to one element of the set (e.g. document type). If it maps to a mono-valued attribute, then its value must be equal to the value of the termcap attribute (e.g. colour capability). 2. If a q-value is attached to a termcap attribute value, then the characteristic value corresponds to one of the termcap value with the highest q-value (e.g. charset). 3. If the termcap attribute is a set of ordinal values, then the characteristic corresponds to one of these values, preferably to the highest one (e.g. versions of a content type -- WML 1.1, 1.2, 1.3). 4. If the termcap attribute is a numeric value, then the output that minimizes the difference (termcap.value - characteristic.value), under the constraint >= 0, is selected (e.g. decksize, pixel depth). This applies by analogy to composite values (e.g. screen size). During the process, all values under consideration are converted to their canonical form to ensure consistent comparisons. Certainly not a final statement, and it does not encompass the entire user experience (it cannot) -- but at least this would go some way towards formalizing what "striving for the best possible user experience that the user agent supports." It takes care of some basic consistency requirements as well. It is also reasonably testable by independent parties. And it nicely ties in with other well-established standards. As a result, a transcoder will select XHTML mobile profile over WML, XHTML 1.2 over 1.1 or 1.0, with tables instead of lines with hard line breaks, a page size as close to 60000 instead of too small 10000 or too large 512000, 16bit colour pictures instead of b&w (which will neither be 96x48 nor 240x320 if the screen is 176x220), and it will be encoded in KOI8-R instead of UTF-8 if that is what the terminal prefers. > To hopefully clarify this: we do not envision a case where the > transformation proxy transforms the character encoding of > data submitted > by an end user other than to rollback a previous change of character > encoding in the page that contains the form. If there's any ambiguity > here, then we should make it clear. I guess this discussion already > proves that the text is not clear enough ;-) A shame that the older version was eliminated, since the text was actually quite good and addressed my concern directly! > For instance, regarding q-values, my browser sends in HTTP requests: > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 > > That's supposed to be a "user preference" according to the HTTP RFC: > http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.2 > > As a user, I just don't see the difference between a page served in > ISO-8859-1 or in utf-8. There is no relationship whatsoever > with my user > experience. I don't think it's supposed to represent a measure of the > lack of support for certain characters either, although it may be the > case in practice. There is indeed a reason why iso-8859-1 is considered "better" than utf-8 in your handset (and in many others). iso-8859-1 is a single-byte encoding; it is therefore extremely efficient to decode -- in fact no decoding: each byte is an index into the symbol table since the 256 code points of iso-8859-1 map to the first 256 code points of Unicode. Whereas utf-8 requires the multi-byte decoding machinery to launch, convert sequences of bytes into an index before accessing the symbol in the Unicode table. There is therefore a performance issue which impacts user experience, although for the difference to become perceptible, one would probably need very long texts with lots of accented letters and lots of special iso-8859-1 symbols. > I'm trying to find examples of cases where changing the character > encoding supported by a client to another one also supported may be > needed or at least useful. Let me try one: > Let's suppose I'm in Russia with a Russian device. It supports > ISO-8859-1, ISO-8859-3, and UTF-8. > A transcoding proxy detects that a page whose size is 1MB > needs to be > paginated. The page uses the ISO-8859-1 encoding. The proxy paginates > the page, and adds a "next page" link. "Next page" cannot be > written in > Cyrillic in ISO-8859-1. ISO-8859-3 could be used if the page only > contains characters common to both ISO-8859-1 and ISO-8859-3, UTF-8 > otherwise. A straightforward way is to insert "next page" in russian as numeric character entities and keep everything as iso-8859-1. No analysis of the actual character set and conversions needed. Much faster. Slightly bulkier (about 49 bytes instead of 9). > Should the guidelines forbid the change of encoding on the > grounds that > the original page encoding was already supported by the > device? Again, I agree it can't work in cases where a form is > involved, because > there is obviously no mapping from UTF-8 back to ISO-8859-1 for all > characters. I think this is the point: the changes must take into account the end-to-end context. Forms are the most direct example where this is unavoidable. > I don't think that's the position of the Task Force. > The position is more that *any* re-structuring operation must > be carried > out with care because it may break the user experience rather than > improving it, or lead to broken scenarios such as the ones > you mention. > We acknowledge that, but do not think that the scope of the > guidelines > is to define what type of re-structuring may be done and what type of > re-structuring should not be done, but rather to define a few control > mechanisms for the content providers and the end users (and we're > limited to existing technology in order to do that). I am a bit puzzled by the fact that on the one side one acknowledges that transformations may break applications and calls upon transcoder deployers to perform them carefully, and on the other side one refuses to state what are worst or best practices in this respect, or even list the potential troublesome consequences of doing it. > Does "lcd" stand for "Least Common Denominator"? I'm not sure > I get this > part. Yes, sorry for the abbreviation. The reason is that there are sites that will produce standard pages in one format, one encoding, one language, no matter what user agent from whatever IP address is thrown at them. Hence, no need for a vary HTTP field. > In the meantime, "Cache-Control: no-transform" is indeed the only > reliable method we could find. It may not work for WML > (although do WAP > gateways actually respect this directive in practice?), Well, a) Shouldn't the CTG group have investigated the issue and found the final answer to your question already? All the more so since representants of an operator that has probably tested every WAP gateway for its own needs is present in the redaction committee. If you know this is an issue, why leave it hanging? b) I believe some do, because at the introduction of WAP 2.0 support for WBXML was to be dropped in some terminals. These would therefore handle WML with a normal XML parser (just like XHTML), and render it directly instead of launching a special interpreter for WBXML. This means sending textual WML content to the device, and hence WAP gateways had to refrain from performing the wml-to-wbxml encoding and recognize no-transform directives (gateways have always been ready to accept either wml and wmlc/wbxml from servers). > but it is far > more reliable than any other heuristics we could think of, because it > carries the expectations of the content provider: do not transform. I wish transcoders deployed in the field respected the directive... > Yes. Note I wouldn't mind putting a special emphasis on the > dangers of > playing with character encodings. I just think that the list > of "beware" > notes could grow quite easily (tidying markup, messing with scripts, > restructuring of CSS style sheets, finding breaks for pagination, ...) I do not see any problem with that; after all, shouldn't the guidelines make it clear what is at stake? At least RFC always include a chapter "security considerations", and often others such as "compatibility" (see RFC2616), or even chapters describing the rationale for doing or not doing certain things (see RFC3023). The CTG would benefit from a similar approach. Cheers E.Casais
Received on Thursday, 30 October 2008 07:12:48 UTC