ACTION-581 Promote discussion on list, and produce editor's draft of Guidelines from Jo Rabin on 2007-11-19 (public-bpwg-ct@w3.org from November 2007)

From: Jo Rabin <jrabin@mtld.mobi>
Date: Mon, 19 Nov 2007 13:28:58 -0000
To: <public-bpwg-ct@w3.org>
Message-ID: <C8FFD98530207F40BD8D2CAD608B50B4904BCB@mtldsvr01.DotMobi.local>
Hello everyone.

A little later than anticipated and not in a pretty form yet [though you
should see this as the editor's draft mentioned in the ACTION], this
elaborates Magnus's original text for 2.1, and taking note of the
threads mentioned below goes on to propose text for subsequent sections
- these make reference to but not been merged with the original
contributions on sections 2.2 and 2.3 from Sean Patterson and Aaron,
which are included verbatim below.

Both Sean and Aaron make significant points about the advantages of
content transformation and why this can have a positive impact on the
user experience. These points need capturing, but I think probably as a
consolidated preamble - and also bearing in mind that the Landscape
document is actually supposed to be the place where such points are
discussed - if it doesn't at present make these points clearly enough
then we need a further revision of that document to make sure that the
points made in these contributions are noted there.

In this draft, I've taken a slightly orthogonal approach to what we
originally thought, which is to follow the course of a request and
response and identify what each of the participants in its path is meant
to do. Consequently the chapter outline as originally envisaged has not
been followed in detail. Once this all has become a little more fleshed
out, we might decide to rethink the sections in the document, but no
need to worry about that for now.

I have tried not to confine discussion to HTTP based signaling, as I
think the following require mention at least as heuristics, if not
recommended practice as they do play a role:

a) a priori knowledge of device characteristics, as gleaned from a DDR;
b) administrative arrangements, white lists etc.;
c) heuristics, such as knowing which content types and DTDs are
specifically mobile, looking for the presence of "handheld" in style
sheets and @media attributes, looking for mobileOK labels;
d) User interaction

In reference to one of Bryan's contributions, user interaction needs
more thought and discussion - on the one hand we don't want to interrupt
the user experience with excise tasks, yet on the other, in the end, the
user must act to signal their intentions and this needs noting. E.g.
there could be a note that the host should provide interactions that
allow the user to have a choice of presentations and so should the proxy
and the client, for that matter.

Another as yet unopened Pandora's box is that the discussion and
proposed text below looks at the issues primarily from the point of view
of "varying presentation from Thematically consistent URIs". What
hasn't, as yet, been explored is how it all works if there is a common
entry point to a site (Thematically consistent URI for a home page)
which then dispatches via redirect to media specific versions. This is
possibly rather more common than the previous case (e.g. redirect to
example.com/mobile - or rather better, imo, example.mobi). Naturally,
there will also be varying presentation even within a redirected
solution. This whole area needs further thought.

Whatever we come up with does of course have to deal with conforming and
non conforming and transforming and non-transforming proxies. There
isn't, as yet, a use case analysis, it is a bit too soon for that, I
think. 

The philosophy here should be in line with existing HTTP practice, which
is to fall back to safe behavior. Thus, when trying to distinguish
reformatting behavior from recoding behavior, the objective is to fall
back to "safe" known HTTP/1.1 practice for non conforming (unaware) and
say things like:

Cache-Control: no-transform, allow-reencode

as this will result in a stricter interpretation by unaware
participants. This behavior is discussed in detail in HTTP section
14.9.6 (reproduced below in this note for your convenience and see
Sean's detailed list of references to points in the HTTP spec that need
to be included also).

This, of course, immediately introduces the question as to whether we
are over stepping the mark in introducing such extensions, and I think
we need to be clear about that before going further. On the one hand
HTTP makes it clear, in explaining how to introduce extensions that it
expects such extensions to be introduced. On the other hand, we do
typically take a conservative approach and say if it is not in the IANA
registry then it's not an existing protocol and therefore beyond our
scope. Introducing extensions to existing header values, to my mind
falls short of introducing new headers. Though it's not clear that we
can do what we need to if we don't do that, go through IANA registration
and so on. 

I think that we are going to need to do that and suggest we speak to
this point tomorrow on our call, if necessary by joining forces with a
group that is actually chartered to "invent new protocols". The
alternative being a much more insipid document that only gets to a small
subset of the problem.

I'd also like to bring the group's attention to the following RFCs:

RFC 2506 Media Feature Tag Registration Procedure
RFC 2295 Transparent Content Negotiation
RFC 2296 Remote Variant Selection Algorithm

RFC 2295 is experimental, but actually gets to some of the points we
want to make, though doesn't exactly address what we are doing. It's
rather a lengthy and detailed read, and has a lot of features that we
don't need. It does, however, introduce a couple of headers and field
values which have been IANA registered. Also, the main points of the
negotiation are implemented in Apache in mod_negotiation (see [APACHE]).

[APACHE] http://httpd.apache.org/docs/2.2/content-negotiation.html

IANA registration is probably a bit of a nuisance, and may be something
we don't need to do - e.g. it would seem that the q parameter for
content type and much else is not registered. For those of you who fancy
a bit of train spotting, I think you'll find registered things at
[IANA], though I confess I find this all a bit impenetrable and
difficult to navigate.

[IANA] http://www.iana.org/numbers.html

I have tried to take into account the contributions and discussions on
the list, especially those threads starting at the following points.
Some are quite lengthy threads and can be followed with the "Next in
Thread" link:

Magnus's original proposal for 2.1 [1] elaborated in the text below
[1]
http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Sep/att-0014/00-p
art

Sean Patterson's original proposal for 2.3 [2] points included in the
text and included verbatim
[2] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Sep/0029.html

Aaron's contribution for section 2.3 [3] points included in the text and
included verbatim
[3] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Sep/0025.html

Pointer to ISSUE-222 TAG Finding on Alternative Representations
[4] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Oct/0011.html

Pointer to ISSUE-223 (Jo's CT Shopping List): Various Items to Consider
for the CT Guidelines
[5] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Oct/0012.html

Pointer to ACTION-575 Techniques for Guidelines Document
[6] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Oct/0023.html

Scope of CT Guidelines
[7] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Oct/0041.html


And with that extremely length pre-amble and disclaimer, here goes:

___________

.Overview

The purpose of this section is to explore the need for actors (clients,
proxy servers, gateways, origin servers, etc) to communicate with each
other, and also suggest guidelines for doing so. The relevant scenario
involving a content transformation proxy is as follows:

client browser <---HTTP---> content transformation proxy <---HTTP--->
origin server

There may be other scenarios as well but they will initially be ignored
for the sake of simplicity. The needs of these three actors are as
follows:

   1. The client browser needs to be able to tell the content
transformation proxy:
     a. what media-type (presentation format e.g. desktop, handheld) is
desired.
     b. that all content transformation should be avoided, or that
reformatting is allowed/desired
     c. what type of mobile device and what user agent is being used
     d. that the device has (zoom, linearize, keyhole) presentation
[@@??]
   2. The content transformation proxy needs to be able to tell the
origin server:
     1. that some degree of content transformation (re-coding and
reformatting) can be performed
     2. Content transformation will be carried out unless instructed not
to.
     3. that content is being requested on behalf of something else.
     4. about the delivery context (for example mobile device type and
user agent).
     5. That the request headers have been altered (e.g. additional
content types inserted) [??]
   3. The origin server needs to be able to tell the content
transformation proxy:
     1. that content is already optimized and no additional
transformation is required (or that it should not be restructured by may
be recoded]
     2. that it's OK to perform additional content transformation.[??]
     3. That it varies its presentation
     4. That it has media-specific presentations
     5. I can't/don't wish to handle this request in its present form
     6. That request headers should/should not be modified	
   4. The content transformation proxy needs to be able to tell the
client browser:
     1. the status of the content: it is reformatted/recoded/untouched;
     2. where to find the original content if it has been transformed.
[@@ should this read "how", or do we suppose that there are "magic"
mechanisms/URIs for by-passing proxies?]


.. Objectives

In satisfying these requirements existing HTTP headers and directives
and behaviors must be respected. However, not all of the features
required can be achieved without extensions to the behaviors defined in
[RFC 2616]. Knowing that many actors will be unaware of any HTTP
extensions, special consideration needs to go into making sure that the
fall-back behavior - i.e. strict adherence to HTTP/1.1 - is "safe". For
example, if there is no standard way for a client browser to specify
that all content transformation should be avoided in a request, then we
must define a default behavior for a well-behaved content transformation
proxy that receives a request from such a client.

[@@ other principles behind what we are trying to do - e.g. noting
Sean's point that there is a wide diversity of different devices that
all fall under the simple appellation of "handheld".]

..Types of Proxy

HTTP defines two types of proxy: transparent proxies and non-transparent
proxies. As discussed in Section 1.3 [HTTP], Terminology:

A "transparent proxy" is a proxy that does not modify the request or
response beyond what is required for proxy authentication and
identification. A "non-transparent proxy" is a proxy that modifies the
request or response in order to provide some added service to the user
agent, such as group annotation services, media type transformation,
protocol reduction, or anonymity filtering. Except where either
transparent or non-transparent behavior is explicitly stated, the HTTP
proxy requirements apply to both types of proxies.

This document elaborates the behaviour of non-transparent proxies, when
used for content transformation in the context discussed in [Content
Transformation Landscape] and henceforward referred to as transforming
proxies.

..Types of Transformation

Transforming proxies can carry out a wide variety of operations. To
carry out an exhaustive survey of those operations and to discuss means
of server or client side control of them is beyond the scope of this
document. In this document we categorize this rich vocabulary of
possible operation into two types: 

1) Alteration of Request Headers 
2) Alteration of Responses 

Alteration of responses is further sub-categorized into

a) restructuring content; 
b) recoding content;
c) optimizing content. 

Restructuring content is a process whereby the original layout is
altered so that content is added or removed or where the spatial or
navigational relationship of parts of content is altered, e.g. by
linearization or pagination.

Recoding content is a process whereby the layout of the content remains
the same, but details of its encoding may be altered. Examples include
re-encoding HTML as XHTML, correcting invalid markup in HTML, conversion
of images between formats (but not, for example, reducing animations to
static images).

Optimizing content means removing redundant white space, recompressing
images (without loss of fidelity), zipping for transfer ...


..Alteration of HTTP Requests and Responses

Alteration of HTTP requests and responses is not prohibited by HTTP
other than in the circumstances referred to in [HTTP] section 13.5.2.
This document describes how the Client and the Destination Server may
require conforming transforming proxies not to alter HTTP requests and
responses.

..Control by Client/User

A transforming proxy gains knowledge of whether a user requests
alteration of requests and responses by:

a) Administrative arrangements between the provider of the proxy and the
end user;
b) As a result of the request containing an indication that changing the
request headers must not be carried out;  
c) Direct interaction with the User; 
d) Other means.

..Control by Server

A transforming proxy gains knowledge of whether a server permits
alteration of requests and responses by:

e) Administrative arrangements between the provider of the server and
the provider of the proxy;
f) For requests, by having previously received an indication from the
origin server as a response to a request [for a resource on the path
that this request is in scope of] that transformation of headers is not
permissible;
g) For responses as a result of the response containing indications as
to the servers intentions - including mobileOK labels;  
h) Other means.

Aside from b) f) and g) above, these techniques are generally out of
scope of this document, however use of knowledge gleaned for sources
other than HTTP is referred to below. 

Transforming proxies SHOULD allow the overriding of standing
administrative arrangements on a request by request and response by
response basis.

.Behavior of Components
..Client Request to Proxy

The client may request that the Content-Type and Content-Encoding MUST
NOT be altered in the response by setting the Cache-Control:
no-transform directive.

The client may add a [@@preserve-headers directive] to indicate that
transforming proxies MUST NOT alter other aspects of the request
headers, except as permitted by HTTP/1.1 to allow correct operation of
caching functions [want to say that do not affect transparency, but that
is probably not technically exact]. The [@@preserve-headers directive]
may only be present in addition to the no-transform Cache-Control
directive.

The client may add an [@@allow-recode directive] to the Cache-Control:
no-transform directive, indicating that the proxy MAY change the format
of the response but not restructure the content.

The client may add an [@@allow-compress] to the Cache-Control:
no-transform directive, meaning that a proxy MAY remove redundant white
space, recompress images or change the Content-Encoding (to use gzip,
from identity, for example).

The client may also add [@@preferred-medium directive] indicating that a
preference for a presentation style. The [@@preferred-medium directive]
has the form media=presentation-format (as described in RFC ..., current
values of the presentation format-directive are taken from IANA ... and
include "screen" and "handheld").

[It would be nice if the client were able to indicate what type of
presentational capabilites it has, for example, zoom, linearize, keyhole
... @@@ client-feature indication]

..Proxy Request to Server

If the request contains a Cache-Control: no-transform directive [@@or
any of the other directives specified in previous section] the proxy
MUST forward the request unaltered to the server. 

If there are no [@@ such directives] present in the request from the
client, and there is no indication from a downstream proxy that it
intends to transform [@@ see I will transform below] the proxy SHOULD
analyze whether it intends to offer transformation services by referring
to any administrative arrangements that are in place with the user of
the client, or the server, and any a priori knowledge it has of client
capabilities [@@ from a DDR and so on]. Knowing that the client has
available a linearization or zoom capability the proxy SHOULD NOT
attempt to offer that service. Knowing that a client is capable of a
broad range of formats the proxy SHOULD NOT offer to recode content.

If as a result of this deliberation it intends to restructure the proxy
MUST indicate this by including a [@@@ I will transform (restructure /
reformat / compress)] - [@@ and even if it doesn't it MAY indicate its
potential for restructuring or recoding or compressing content [@@by
means of ...].

The proxy MUST include a Via HTTP header indicating its presence.

Proxies MUST NOT intervene in https and SHOULD NOT intervene in methods
other than GET and HEAD.

...Alternative 1

When altering the Accept HTTP header, the proxy SHOULD indicate any
formats that it intends to recode for delivery by assigning a lower q
factor (indicated by the q parameter) than those natively supported and
should, in addition,[@@extension] add a further transform parameter
indicating that the format is not natively supported by the client.

e.g. Accept: image/jpeg, image/gif, image/png;q=0.7;[@@transform]

When altering the User-Agent HTTP Header the proxy MUST indicate this
change by adding a [@@ User Agent Modified indication with the Original
User-Agent indicated]

If other HTTP header fields are altered then the proxy MUST be prepared
to re-issue the request as received from the client on receipt of a Vary
header in the response indicating that the server offers variants of its
presentation according to any of the HTTP header fields that have been
modified.

...Alternative 2

When altering the Accept HTTP header, the proxy SHOULD indicate any
formats that it intends to recode for delivery by assigning a lower q
factor (indicated by the q parameter) than those natively supported.

e.g. Accept: image/jpeg, image/gif, image/png;q=0.7

If other HTTP header fields are altered then the proxy MUST be prepared
to re-issue the request as received from the client on receipt of a Vary
header in the response indicating that the server offers variants of its
presentation according to any of the HTTP header fields that have been
modified.

..Server Response to Proxy

If the server varies its presentation according to examination of
received HTTP Headers then it MUST include a Vary HTTP header indicating
this to be the case. If, in addition to, or instead of HTTP headers, the
server varies its presentation on other factors (source IP Address ...)
then it MUST include a * as one of the fields in the Vary response.

The server MUST include a no-transform directive if one is received from
the client. If it is capable of varying its presentation it SHOULD take
account of client capabilities [@@as derived from a DDR etc.] and
formulate an appropriate experience according to those criteria. 

If the server has distinct presentations according to its perception of
the presentation media, then the medium for which the presentation is
intended SHOULD be indicated [@@using the ...] 

If the client has requested a specific presentation using the [@@
directive] the server should provide a presentation of that kind. e.g.
if the server would ordinarily provide a handheld experience but the
client requests a screen experience the screen experience should be
provided. And vice versa, of course.

If the server creates a specific user experience for certain
presentation media types it SHOULD inhibit transformation of the
response by including a no-transform directive. The server SHOULD NOT
prohibit recoding or compression of its content unless it has specific
reasons not to allow it [including that this has been requested by the
client] and hence should in general add a [@@allow-recoding or
allow-compression] directive when adding a no-transform directive.

Note that including a no-transform directive may [@@SHOULD actually]
disrupt the behaviour of WAP/WML proxies, because this inhibits such
proxies from converting WML to WMLC (because this is a content-encoding
behavior). Adding [@@allow-recoding] or [@@allow-compression] is
unlikely to be recognized in the short-term by such proxies which
predate these guidelines.

Servers MAY base their actions on a priori knowledge of behaviour of
transforming proxies, when they are identified in a Via header.

The server SHOULD NOT choose a Content-Type for its response based on
its assumptions about the heuristic behavior of any intermediaries.
(e.g. it should not choose content-type: application/vnd.wap.xhtml+xml
solely on the basis that it suspects that transforming proxies will
apply heuristics that make them not restructure it). 

If servers provide only limited variants of presentation they SHOULD
consider providing a rich presentation and allowing a transforming proxy
to reduce this - which may result in a richer experience for the user
than providing a basic handheld experience only, say.

406 Response - Note that some clients (MSIE for instance) don't display
the body of a 406 response, this is in contravention of HTTP/1.1 as far
as I can see.
Vary headers in 406 response - restrict to the one(s) that have caused
the 406.

In general, successful responses should are done with 200 OK Vary:
User-Agent, Accept, Accept-Language etc.
e.g. MS doesn't want you to do updates except with IE. so they should
say 406 Vary: User-Agent
(but note that IE doesn't display the body of 406 responses)

Servers should respond with a 406 not a 200 if they can't handle the
request and should indicate that they permit header alteration in that
406. Servers should provide information about alternative
representations by using the Vary header (if the alternatives are
available from the same URI) or using link information if alternative
representations are handled by different URIs. [This restricts to HTML
for now. If link headers a reinstated in HTTP then this becomes a more
universal mechanism. Open question as to whether it SVG or WICD etc.
support any such notion]

[@@300 Response - could this be used as a signal from the server to say
that it understands the protocol? A la RFC 2295]


.. Proxy Receipt of Response from Server

If the proxy has altered any of the HTTP request headers, and it
receives a Vary response from the server it should re-make the request
with the original headers and forward the subsequent response without
restructuring it, irrespective of the contents of the subsequent
response. The proxy SHOULD take note of this and SHOULD NOT vary headers
for subsequent requests, unless requests are subsequently received with
the Vary header [@@ + note on backoff below] 

[@@note that loop detection and elimination is needed here]

.. Proxy Response to Client

If the response includes a Warning: 214 Transformation Applied the proxy
MUST NOT apply further transformation.

If the response includes a Cache-Control: no-transform directive that is
not modified by [@@ other directives on recoding] then the response MUST
be forwarded to the client unaltered.

In the absence of a Vary or no-transform directive the proxy SHOULD
apply heuristics to the content to determine whether it is appropriate
to restructure or recode it (in the presence of such directives,
heuristics SHOULD NOT be used.)

e.g.
a. The server has previously shown that it is contextually aware, even
if the present response does not indicate this - modified by a need for
the proxy to be aware that the server has changed its behavior and is no
longer aware in that way 
b. the content-type is known to be specific to the device or class of
device e.g. application/vnd.wap.xhtml+xml
c. examination of the content reveals that it is of a specific type
appropriate to the device or class of device e.g. DOCTYPE XHTML-MP or
WBMP or [@@mobile video] [@@ note Sean's extensive list of heuristics
that should be included as an informative example?]
d. The response is an HTML response and it includes <link> elements
specifying alternat(iv)es according to media type [or that such links
are included as HTTP headers] or that the content has a mobileOK label.

If the proxy alters the content then it MUST add a Warning: 214
Transformation Applied HTTP Header

.. Client Action on Receipt of Response

[@@ discussion of what to do on receipt of Warnings etc.]


. Encoding of [@@new] Features 

preferred-medium = screen;
and so on
[@@TBD]

.Use Case Analysis

Client		Proxy		Server
Unaware		Unaware		Unaware
etc.
[@@TBD]

.Testing

All ... must be tested for deleterious effects ...
[@@TBD]

Providers of transforming proxies SHOULD make available interfaces that
facilitate testing of Web sites accessed through them. [@@ though how
they should make known how to do this and what administrative
arrangements would be needed are both probably out of scope]

______________________________________________

Sean Patterson's contribution under ACTION-550
______________________________________________

2	Guidance for Delivery Chain Component Developers
2.3	Guidance for Content Transformation Server Developers
Content transformation servers have the ability to transform content
into a form that is suitable for a requesting entity's delivery context.
However, a content transformation server that is invisible from browsers
and other servers on the network can cause problems.  These problems
include transforming content that should not be transformed, multiple
transformations, and sub-optimal transformation.  This section contains
guidelines for developers of content transformation servers to help
avoid these problems.
2.3.1	The Need for Content Transformation Servers
2.3.1.1	Variation of device capabilities
While there are many mobile devices in existence today that give their
users the ability to browse the web, the majority of devices are not
capable of accessing web content.  Even for those devices that can
access the internet, there are large variations in their web browsing
capabilities.  Content transformation servers can transform web content
into a form that works well on any particular device.
2.3.1.2	Most content is not designed for mobile devices
The majority of web sites are designed for users of desktop (or laptop)
computers.  These computers have large screens, a mouse, full-size
keyboards, fast CPUs, large amounts of memory, and are fully connected
to the Internet, typically at broadband speeds.  Mobile devices
(especially mobile phones) normally have none of these characteristics.
Regular web content frequently assumes that it will be displayed using
the hardware of a desktop computer.  Content transformation servers can
reduce the hardware requirements of the content so that it works better
on a mobile device.
2.3.1.3	Most content is not designed for mobile browsers
Most web content is designed to be displayed on web browsers that run on
desktop computers.  These are full-featured browsers that can display
web sites that use complex HTML, CSS, and JavaScript as well as
multimedia content such as Flash and video.  In addition, most desktop
web sites assume that the user has a mouse or other pointing device.
Mobile devices frequently have much more limited web browsers.  Regular
web content may not display properly or at all on the web browser in a
mobile device.  Even if a desktop web site displays reasonably well, it
may be difficult to use on a mobile phone.  Content transformation can
transform the content into a simpler form that can be displayed and used
on a mobile browser.
2.3.1.4	Variation of mobile content
There is a wide variation of what is considered "mobile content."
Mobile content that is designed for a high-end mobile device may not
display well or be useable on lower-end mobile devices.  In this case it
makes sense for a content transformation server to transform the content
developed for a higher-end mobile device into content that is suitable
for a lower-end device.
2.3.1.5	Eliminates the need for a least common denominator solution
One approach to the problem of the variation of mobile devices is to
create a "least common denominator" page that works on all (or almost
all) mobile devices.  This approach is simpler than having multiple
versions of the page (see the next section), but limits the end user
experience.  An example of a least common denominator approach is
writing content that will work with the Default Delivery Context" (DDC)
defined in the "Mobile Web Best Practices 1.0" W3C Proposed
Recommendation [1]. The "Default Delivery Context" outlines the baseline
characteristics that a device must implement in order to be suitable for
browsing the web.  If a content transformation server exists on the
network, the least common denominator approach is not necessary.
Instead, a rich version of the site can be created with the knowledge
that it will be "reduced down" for any requesting entity that is less
capable.
2.3.1.6	Reduces the need for multiple versions of a site
Another way to handle the variation of mobile devices is to create
multiple versions of a web site to deal with the multiple types of
mobile devices that can access the site.  This approach is costly to
establish and maintain across the increasingly diverse range of handsets
available.  When a content transformation server exists in the network,
the need to create multiple versions for different mobile devices is
reduced.  Again, a single, rich version of the site can be created and
easily maintained.
2.3.1.7	A content transformation server can do a better job of following
mobile best practices
The "Mobile Web Best Practices 1.0" W3C Proposed Recommendation [1]
contains many recommendations for authoring content that is intended for
viewing on a mobile device.  A well-designed content transformation
server can do a better job of following the mobile best practices than a
human author, especially when taking into account the capabilities of
the many different mobile devices.  The result will be a more
consistent, uniform experience.
2.3.2	Guidelines of how content transformation servers should
communicate with the rest of the delivery chain
2.3.2.1	Identifying the content transformation server
HTTP 1.1 requires that all proxy servers append a string to the Via
header [2] for any request or response they forward.  This string
consists of the name of the protocol of the received message, the
version number of the protocol, the hostname (or a pseudonym if the
hostname is sensitive information), and an optional comment.  (The name
of the protocol is assumed to be HTTP if not specified.)  Content
transformation servers should identify themselves in the comment of the
string they put in the Via header.  Here is an example where a content
transformation server at zzz.net adds itself to the Via header:

Via: 1.1 nowhere.com (Apache/1.1), 1.1 zzz.net (CT-Server-2000/1.0)

Unfortunately, the HTTP 1.1 protocol specification [3] allows subsequent
servers that receive the message to remove comments in the Via header.
So, while it is recommended that content transformation servers identify
themselves in the Via header, it is not always reliable.

A more reliable method for identifying a content transformation server
is to use the X-Mobile-Gateway header.  The syntax of the
X-Mobile-Gateway header is as follows (expressed in Augmented BNF form
as described in [4]):

X-Mobile-Gateway  = "X-Mobile-Gateway" ":" 1*( product | comment )

An example would be:

X-Mobile-Gateway: CT-Server-2000/1.0 (Server-Only; Linux i686; en-US),
Super-CT-Server/2.0 (Headers, Footers; MS Windows XP i686; en-US)

The syntax for each content transformation server in the
X-Mobile-Gateway header is the same as for the User-Agent and Server
headers.  It is recommended that value of this header contain the
product name and version of the content transformation server as well as
a comment in parentheses that contains useful characteristics of the
content transformation server separated by semicolons.  See [5] for the
syntax of "product".

Each subsequent content transformation server in the request/response
chain appends its information to the end of the X-Mobile-Gateway header.
In contrast to the Via header, content transformation servers are only
allowed to append to the end of the X-Mobile-Gateway header; no other
modifications are allowed.
2.3.2.2	The User-Agent header
It is frequently necessary for content transformation servers to replace
the User-Agent header in requests with a value that is the same as used
by a desktop browser.  For example, the content transformation server
might use the following User-Agent header:

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6)
Gecko/20070725 Firefox/2.0.0.6

Although web servers are technically supposed to base the content they
send to browsers on the Accept header [6], it is very common for web
servers to use the User-Agent header to make decisions about the content
to return to a particular browser.  For example, a web site that has
both a desktop and mobile version may examine the User-Agent header and
send the desktop version of the site if the User-Agent is recognized as
a desktop browser and return the mobile version of the site if the
User-Agent is recognized as a mobile browser on a mobile device.
Content transformation servers typically want the origin server to send
the desktop version of the site since the desktop version is usually
more functional.  This is the reason that content transformation servers
frequently send a User-Agent header from a desktop browser.

If the origin server needs to know what the actual User-Agent header is
from the original device that made the request, it can examine the
X-Device-User-Agent header (see section 2.3.2.3).
2.3.2.3	Identifying the mobile browser
Since content transformation servers typically replace the User-Agent
header in the original request from the mobile browser with a desktop
User-Agent string, there needs to be a way for the origin server to
identify the mobile browser that made the original request.  This is
done with the X-Device-User-Agent header.  The syntax for the
X-Device-User-Agent header is as follows:

X-Device-User-Agent  = "X-Device-User-Agent" ":" 1*( product | comment )

(The syntax is the same as for the User-Agent header.)

When a content transformation server replaces the User-Agent header with
a desktop User-Agent string, an X-Device-User-Agent header should be
added to the request and the original User-Agent value from the mobile
browser should be copied without modification to the X-Device-User-Agent
header.  This will allow the origin server to detect the type of mobile
browser and mobile device that made the request if it needs this
information.

Content transformation servers should not modify the X-Device-User-Agent
header if it already exists.
2.3.2.4	Determining whether or not a web page should be transformed
There are times when the origin server wants a web page to be sent to
the mobile web browser unchanged.  The origin server can signal that it
does not want a web page to be transformed by a content transformation
server (or any other proxy) by using the Cache-Control [7] header.  The
no-transform directive [8] is used to specify that the entity body of a
response from the origin server should not be modified.

Cache-Control: no-transform

The Cache-Control header must be honored for both requests and
responses.  A content transformation server must not modify the entity
body of any request or response that uses the Cache-Control:
no-transform header.  In addition there are a handful of headers that
should not be modified as well.  See [9] for a list of those headers.

The Cache-Control: no-transform header can be added by content
transformation servers but it should not be modified by content
transformation servers.  
2.3.2.5	Notification that transformation has been applied
If a content transformation server makes changes (i.e., transformations)
to the entity body in a response, the content transformation server must
set the Warning header [10] to "214":

Warning: 214 zzz.net "Transformation applied"

This lets the browser and any other content transformation servers in
the request/response 
2.3.2.6	Identification of mobile content
Content can be identified as intended for mobile browsers by one of the
following methods:

*	The Content-Type header of the response is one of the following
values:
o	application/vnd.wap.xhtml+xml
o	text/vnd.wap.wml

*	The document type of the response document is
o	<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN"
"http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
o	<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.1//EN"
"http://www.openmobilealliance.org/tech/DTD/xhtml-mobile11.dtd">
o	<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.2//EN"
"http://www.openmobilealliance.org/tech/DTD/xhtml-mobile11.dtd">
o	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
o	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd">

*	There is a link element in the response document with a media
attribute that has a value of "handheld" that points to a mobile
document.  Here is an example:
	
		<link rel="alternate" media="handheld"
href="www.mobileversion.com/" />

		Origin servers that want to present a choice to the user
of whether to view the desktop version of a web page or the mobile
version may use this technique.  (The mobile browser would need to have
the capability of presenting the choice to the user for this to work.)

Identifying mobile content is important when the content transformation
server is deciding which transformations to apply to the response
content received from the origin server.

*	if the response content is identified as mobile, the content
transformation server should be conservative and try to perform only
non-layout and non-format changing transformations.  For example, it
would be OK to accelerate the content (by removing non-layout
whitespace, non-lossy compression, etc.), add a header and/or footer to
the page, apply content corrections, etc.  It would less desirable to
remove HTML tables, change the size and/or format of an image, etc.
However, if the content returned from the origin server uses features
that the content transformation server "knows" that the client device
does not support (e.g., by examining the User-Agent header sent the
mobile web browser), it is permissible to make more extensive changes to
make the content more suitable for the client device.  For example, if
an origin server returns an image in GIF format to a device that does
not support GIF images, it would be OK for the content transformation
server to transform the image into a different format that the client
device did support.

*	if the response content is not identified as mobile, and there
is no Cache-Control: no-transform header, the content transformation
server should perform all reasonable transformations on the response.


References

[1]  http://www.w3.org/TR/mobile-bp/
[2]  http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.45
[3]  http://www.w3.org/Protocols/rfc2616/rfc2616.html
[4]  http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.1
[5]  http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.8
[6]  http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1
[7]  http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
[8]  http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.5
[9]  http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.2
[10] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.46

________________________
_____________________________________

Aaron's Contribution under ACTION-551
_____________________________________

2.3 Guidance for Content Transformation Server Developers
Most mobile devices have a limited capacity for receiving and
displaying content that was originally designed for a desktop browsing
environment.  A content transformation server may be used to adapt
desktop content in such a way that it may be successfully retrieved
and rendered by a mobile device.  A few of the well known limitations
include:

    * Poor or non-existent support for markup other than well-formed
XHTML
    * Limited image format support (eg, JPEG only)
    * Limited memory capacity for document retrieval and processing
    * Poor or non-existent HTTPS support
    * Poor or non-existent CSS support

In many cases, sending a mobile content that it was not prepared to
process will cause serious failures, often forcing the user to reset
the device.  A content transformation server can ensure that the
content will be suitable for display on the device, allowing the user
to access the information they desire.

Even in cases where no actual content transformation is strictly
necessary, a content transformation server can improve the experience
greatly by reducing the amount of data that must be transferred to the
mobile.  Decreasing the number of connections required by in-lining
style sheets and other resources can also dramatically reduce the
amount of time spent retrieving and rendering page content.

Some websites provide a mobile alternative that is suitable for
display on some mobile devices.  Unfortunately, the vast majority of
websites do not, and those that do often cater only to a small subset
of the mobile devices that are in active use.  Furthermore, many sites
actively detect and divert non-desktop browsers to "incompatible
browser" pages and the like, preventing the user from seeing any
content at all.  In these situations, a content transformation server
that "pretends" to be a desktop browser on behalf of the mobile can
provide a better experience by retrieving and processing the original
desktop-oriented site.  In the event that a website author does
provide a viable mobile alternative, any content transformation
servers in the delivery chain should recognize this content as
acceptable for mobile display and not attempt to modify it.

In order to increase the chances that a website will provide a viable
mobile alternative, content transformation servers should preserve and
pass on any information about the delivery context that is available.
This includes but is not limited to preserving the HTTP User-Agent and
Accept headers.

[This is an issue I am not actually sure what we want to do about.  On
the one hand, we need to present valid device information to the
origin server so that it may provide a mobile experience, but we also
want to masquerade as a desktop browser to cover the (much more
common) case where the site will refuse to send content for unknown
user agents.  There are several possible strategies, but we will need
to come up one that we can all agree on to present here.]

[TODO: Details of how content transformation servers communicate with
the rest of the delivery chain]

________________________

Juicy Excerpts from HTTP
________________________


14.9.6 Cache Control Extensions


   The Cache-Control header field can be extended through the use of one
   or more cache-extension tokens, each with an optional assigned value.
   Informational extensions (those which do not require a change in
   cache behavior) MAY be added without changing the semantics of other
   directives. Behavioral extensions are designed to work by acting as
   modifiers to the existing base of cache directives. Both the new
   directive and the standard directive are supplied, such that
   applications which do not understand the new directive will default
   to the behavior specified by the standard directive, and those that
   understand the new directive will recognize it as modifying the
   requirements associated with the standard directive. In this way,
   extensions to the cache-control directives can be made without
   requiring changes to the base protocol.

   This extension mechanism depends on an HTTP cache obeying all of the
   cache-control directives defined for its native HTTP-version, obeying
   certain extensions, and ignoring all directives that it does not
   understand.

   For example, consider a hypothetical new response directive called
   community which acts as a modifier to the private directive. We
   define this new directive to mean that, in addition to any non-shared
   cache, any cache which is shared only by members of the community
   named within its value may cache the response. An origin server
   wishing to allow the UCI community to use an otherwise private
   response in their shared cache(s) could do so by including

       Cache-Control: private, community="UCI"

   A cache seeing this header field will act correctly even if the cache
   does not understand the community cache-extension, since it will also
   see and understand the private directive and thus default to the safe
   behavior.





Fielding, et al.            Standards Track                   [Page 116]
 
RFC 2616                        HTTP/1.1                       June 1999


   Unrecognized cache-directives MUST be ignored; it is assumed that any
   cache-directive likely to be unrecognized by an HTTP/1.1 cache will
   be combined with standard directives (or the response's default
   cacheability) such that the cache behavior will remain minimally
   correct even if the cache does not understand the extension(s).

______

13.5.2 Non-modifiable Headers


   Some features of the HTTP/1.1 protocol, such as Digest
   Authentication, depend on the value of certain end-to-end headers. A
   transparent proxy SHOULD NOT modify an end-to-end header unless the
   definition of that header requires or specifically allows that.






Fielding, et al.            Standards Track                    [Page 92]
 
RFC 2616                        HTTP/1.1                       June 1999


   A transparent proxy MUST NOT modify any of the following fields in a
   request or response, and it MUST NOT add any of these fields if not
   already present:

      - Content-Location

      - Content-MD5

      - ETag

      - Last-Modified

   A transparent proxy MUST NOT modify any of the following fields in a
   response:

      - Expires

   but it MAY add any of these fields if not already present. If an
   Expires header is added, it MUST be given a field-value identical to
   that of the Date header in that response.

   A  proxy MUST NOT modify or add any of the following fields in a
   message that contains the no-transform cache-control directive, or in
   any request:

      - Content-Encoding

      - Content-Range

      - Content-Type

   A non-transparent proxy MAY modify or add these fields to a message
   that does not include no-transform, but if it does so, it MUST add a
   Warning 214 (Transformation applied) if one does not already appear
   in the message (see section 14.46).

      Warning: unnecessary modification of end-to-end headers might
      cause authentication failures if stronger authentication
      mechanisms are introduced in later versions of HTTP. Such
      authentication mechanisms MAY rely on the values of header fields
      not listed here.

   The Content-Length field of a request or response is added or deleted
   according to the rules in section 4.4. A transparent proxy MUST
   preserve the entity-length (section 7.2.2) of the entity-body,
   although it MAY change the transfer-length (section 4.4).

_____

  no-transform
      Implementors of intermediate caches (proxies) have found it useful
      to convert the media type of certain entity bodies. A non-
      transparent proxy might, for example, convert between image
      formats in order to save cache space or to reduce the amount of
      traffic on a slow link.

      Serious operational problems occur, however, when these
      transformations are applied to entity bodies intended for certain
      kinds of applications. For example, applications for medical



Fielding, et al.            Standards Track                   [Page 115]
 
RFC 2616                        HTTP/1.1                       June 1999


      imaging, scientific data analysis and those using end-to-end
      authentication, all depend on receiving an entity body that is bit
      for bit identical to the original entity-body.

      Therefore, if a message includes the no-transform directive, an
      intermediate cache or proxy MUST NOT change those headers that are
      listed in section 13.5.2 as being subject to the no-transform
      directive. This implies that the cache or proxy MUST NOT change
      any aspect of the entity-body that is specified by these headers,
      including the value of the entity-body itself.
Received on Monday, 19 November 2007 13:29:18 UTC