Re: CT Proxies and Forward Caches

I am a bit confused as to how/why this all works. It seems to me that 
for this to actually work cache efficiently, the cache would have to 
understand how _exactly_ the server processes the User Agent header.

As pointed out earlier in this thread, there may be countless variations 
on the same basic header that as far as the server is concerned all 
represent near-enough the same thing. However, use of content location 
and vary headers does not give any clue as to how it makes that judgement.

So it seems to me that a proxy, knowing that a server varies its 
representations based on the UA header can legitimately cache _only_ if 
the UA is _exactly_ the same. So what puzzles me is why a content 
location header helps it. I think I must be missing the point here, and 
if so, apologies.

Jo

On 02/06/2008 08:39, Francois Daoust wrote:
> I agree as well...
> 
> ... and I also agree with Jo that, in all cases, having the different 
> representations available at specific locations means extra-work for the 
> CP with no real added-value (save the fact that managing a clean list of 
> the different representations available - tweaks included - eases 
> testing), and is probably not a common practice.
> 
> Francois.
> 
> Umesh Sirsiwal wrote:
>> I agree with Bryan. We may want to recommend (a) as the preferred
>> solution as this saves the extra roundtrip.
>>> -----Original Message-----
>>> From: Sullivan, Bryan [mailto:BS3131@att.com]
>>> Sent: Friday, May 30, 2008 12:25 PM
>>> To: Francois Daoust; Umesh Sirsiwal
>>> Cc: Jo Rabin; public-bpwg-ct@w3.org
>>> Subject: RE: CT Proxies and Forward Caches
>>>
>>> Hi Francois,
>>> With (b) as an option, do you think the proposal would result in a
>>> greater number of redirects? I can see a case where a CP wants to
>>> normally provide only a generic URI, e.g. since this is embedded as
>>> links in other resources. If the CP did not make the specific
>>> representation available at a unique URI also, your proposal would
>>> require that a redirect result for each request related to (or based
>>> upon) the generic URI.
>>>
>>> It might be better just to say: "When varying representations based on
>>> received HTTP headers, cache-efficient techniques should be used. For
>>> example, if the total number of representations is limited whereas the
>>> number of values for a HTTP header used for varying representation is
>>> high [typically the case when varying representations based on the
>>> User-Agent string], the different representations should be made
>>> available at specific URIs and the request to the generic resource
>>> should return the specific representation along with a
>> Content-Location
>>> header that identifies the representation being served."
>>>
>>> This would avoid the message to CP's that redirect to specific
>>> representations (as compared to just returning them) is a recommended
>>> practice, if they are somehow prevented from making the
>> representations
>>> available at specific URI's.
>>>
>>> Best regards,
>>> Bryan Sullivan | AT&T
>>>
>>> -----Original Message-----
>>> From: Francois Daoust [mailto:fd@w3.org]
>>> Sent: Friday, May 30, 2008 7:50 AM
>>> To: Umesh Sirsiwal
>>> Cc: Jo Rabin; Sullivan, Bryan; public-bpwg-ct@w3.org
>>> Subject: Re: CT Proxies and Forward Caches
>>>
>>> Thanks for the clarification, Umesh.
>>>
>>> Very good point.
>>>
>>> The Content-Location header would probably have deserved a mention in
>>> the TAG Finding I mentioned at the beginning of the thread and in
>>> particular in 2.1.1 section [1], third item, since the Vary header
>>> makes
>>> things work, and the Content-Location header makes things
>>> cache-friendly. It saves the redirection, and makes groups used by the
>>> server available to caches without revealing how they were built.
>>>
>>> As far as content-transformation is concerned, there may not be much
>> to
>>> say though as it's a rather generic caching issue. The need to use a
>>> "Vary" on the "User-Agent" header is yet typical of the Mobile world,
>>> so
>>> we probably should emphasize this point somewhere. I'm not sure the
>>> Content Transformation guidelines document is the right place for it,
>>> but since Content-Location sounds like a "natural" companion for the
>>> Vary header, we could add a note, next to the guideline that says that
>>> the server MUST add a "Vary" HTTP header when varying representations,
>>> along the lines of:
>>>
>>> "When varying representations based on received HTTP headers,
>>> cache-efficient techniques should be used. For example, if the total
>>> number of representations is limited whereas the number of values for
>> a
>>> HTTP header used for varying representation is high [typically the
>> case
>>> when varying representations based on the User-Agent string], the
>>> different representations should be made available at specific URIs
>>> and:
>>> a) the request to the generic resource should return the specific
>>> representation along with a Content-Location header that identifies
>> the
>>> representation being served.
>>> or b) the request to the generic resource should return a redirection
>>> to
>>> the specific representation."
>>>
>>> Any other view on that?
>>>
>>> Francois
>>>
>>>
>>> [1] http://www.w3.org/2001/tag/doc/alternatives-
>>> discovery.html#id2261787
>>>
>>>
>>>
>>> Umesh Sirsiwal wrote:
>>>> Hi Fancois,
>>>> Sorry for the confusion. Based on my understanding of the Link
>>>> element, I can further clarify difference between the Link element
>>> and
>>>
>>>> the Presentation-URI.
>>>>
>>>> My understanding is that the Link header provides a method of
>>>> advertising available alternatives for the page being served. On the
>>>> other hand the Presentation-URI provides a method to identify the
>>>> alternative included in the response. In case of the deployment case
>>>> you mentioned below, once the CT proxy has identified the page to be
>>>> served it will include a Presentation-URI header identifying the
>>> selected URI.
>>>> Using this the Vary header will be able to identify the criteria on
>>>> which the server varied its response, while the Presentation-URI
>> will
>>>> be able to identify which of the several alternatives was served.
>>>>
>>>> Rereading HTTP specification, the Presentation-URI is the same as
>>>> Content-Location header field. I am proposing that the CP or the CT
>>>> proxy which can serve multiple presentation of the content for the
>>>> same URI, should include Content-Location header to identify the
>>>> entity it is serving.
>>>>
>>>> -Umesh
>>>>
>>>>> -----Original Message-----
>>>>> From: Francois Daoust [mailto:fd@w3.org]
>>>>> Sent: Monday, May 26, 2008 11:31 AM
>>>>> To: Umesh Sirsiwal
>>>>> Cc: Jo Rabin; Sullivan, Bryan; public-bpwg-ct@w3.org
>>>>> Subject: Re: CT Proxies and Forward Caches
>>>>>
>>>>> Hi Umesh,
>>>>>
>>>>> I'm not sure I completely follow your point here, feel free to
>>>>> correct me.
>>>>>
>>>>> The Presentation-URI header you mention to identify alternative
>>>>> representations being served looks like the "Link" element we're
>>>>> currently discussing in another thread, see:
>>>>>
>>>>>
>> http://lists.w3.org/Archives/Public/public-bpwg-ct/2008May/0021.html
>>>>> and replies.
>>>>>
>>>>> In the case of the Link element, we're currently trying to see when
>>>>> it makes sense to use it, and how it could be used in practice.
>> This
>>>> would
>>>>> indeed avoid the extra round trip in the sense that the CT-proxy
>>>>> would be able to do the redirection for the user and so the
>>>>> "redirect" would not reach the high-latency network the end-user is
>>> connected to.
>>>>> Now, obviously, the problem with the "Link" element is that it is
>> at
>>>>> the markup level, and not at the HTTP level. It would be cool to
>>> have
>>>
>>>>> a "Link" HTTP header, typically for images and more generally for
>>> all
>>>
>>>>> non-HTML content. We're not the only ones who want the "Link"
>> header
>>>>> back to life ("back" since it previously existed but disappeared
>> for
>>>>> lack of use, how ironic ;-)), and there are many on-going
>>> discussions
>>>
>>>>> within W3C and IETF about that. If it ever becomes a reality, it
>>>>> would indeed be useful to serve multiple representations of a
>>> resource.
>>>>> Note that it's not directly related to content transformation in
>>>>> itself.
>>>>> The presence of a content transformation proxy merely adds to the
>>>> case.
>>>>> Did I get you right?
>>>>>
>>>>> Francois.
>>>>>
>>>>>
>>>>> Umesh Sirsiwal wrote:
>>>>>> Jo, Francois, Bryan,
>>>>>> Thanks for the responses. IMO absence of standardization in this
>>>>> space
>>>>>> will cause caches built in CT or otherwise to implement heuristics
>>>>> based
>>>>>> solutions to deduce intent of CP or CT. That is less then
>>> desirable.
>>>>>> To avoid the extra round trip Francois pointed out, the CP can
>>>>> possible
>>>>>> serve an HTTP header (let us call it Presentation-URI) identifying
>>>>>> alternative representation served. The CT proxy or other caches
>>> will
>>>
>>>>>> need to pay attention to this new header. But, as long as Via
>>> header
>>>>> is
>>>>>> always included, they will be able to correctly cache and serve
>> the
>>>>>> content.
>>>>>>
>>>>>> The Presentation-URI does not have to be limited to the three
>>>> groups.
>>>>> In
>>>>>> some cases the Presentation-URI can be very specific and say
>>>>> something
>>>>>> like www.example.com/Device_a. Won't that work?
>>>>>>
>>>>>> -Umesh
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Jo Rabin [mailto:jrabin@mtld.mobi]
>>>>>>> Sent: Thursday, May 22, 2008 6:16 AM
>>>>>>> To: Francois Daoust
>>>>>>> Cc: Umesh Sirsiwal; Sullivan, Bryan; public-bpwg-ct@w3.org
>>>>>>> Subject: Re: CT Proxies and Forward Caches
>>>>>>>
>>>>>>> Aside from the redirect cost that Francois mentions, I am not
>> sure
>>>>>> that
>>>>>>> having separate URIs to allow caching of the "high" "medium" and
>>>>> "low"
>>>>>>> cases is the whole answer, since the response may still vary
>>> within
>>>
>>>>>>> those groups depending on work-arounds to the quirks of any
>>>>> particular
>>>>>>> device within the grouping.
>>>>>>>
>>>>>>> As Francois points out, this relates to the "long-running" ISSUE-
>>>>> 222,
>>>>>>> and it's down to me to try to make sure that it doesn't run much
>>>>>> longer
>>>>>>> :-(
>>>>>>>
>>>>>>> Jo
>>>>>>>
>>>>>>> On 21/05/2008 09:34, Francois Daoust wrote:
>>>>>>>> Indeed, the use of a "Vary: User-Agent" header generates much
>>> more
>>>
>>>>>>>> entries than a more typical use of Vary such as "Vary: Accept-
>>>>>>> Language",
>>>>>>>> and is thus not a really cache-friendly directive.
>>>>>>>>
>>>>>>>> The solution Bryan suggested to create representation-specific
>>>> URIs
>>>>>>> for
>>>>>>>> each UA group, coupled with a redirect response from a canonical
>>>>>>>> representation is much better from a cache perspective but it
>> has
>>>> a
>>>>>>>> cost: that of a round-trip between the server and the client to
>>>>>> serve
>>>>>>>> the redirect response to the representation-specific URI. This
>>>>>>> solution
>>>>>>>> is recommended by the W3C Technical Architecture Group in a
>>>> finding
>>>>>>> "On
>>>>>>>> Linking Alternative Representations To Enable Discovery And
>>>>>>> Publishing"
>>>>>>>> [1].
>>>>>>>>
>>>>>>>> We only mention the use of the "Vary" header in current version
>>> of
>>>>>>> the
>>>>>>>> Content Transformation Guidelines document, but we have a long-
>>>>>>> running
>>>>>>>> discussion (internally named ISSUE-222) on the above mentioned
>>> TAG
>>>
>>>>>>>> finding. We may include that possibility in the document as
>> well.
>>>>>>>> [1] http://www.w3.org/2001/tag/doc/alternatives-
>>>>>>> discovery.html#id2261672
>>>>>>>> Sullivan, Bryan wrote:
>>>>>>>>> Hi Umesh,
>>>>>>>>> As you mention, meta-group assignment (e.g. good/better/best)
>> is
>>>> a
>>>>>>>>> deployment-specific function, i.e. one Content Provider (CP)
>> may
>>>>>>>>> choose a different set of groups and UA assignment as compared
>>> to
>>>
>>>>>>>>> another. Without the direct involvement of the CT proxy in
>> group
>>>>>>>>> selection, the only way I see to reduce the cached
>>>> representations
>>>>>>> is
>>>>>>>>> for the CP to provide a distinct URI to UA's in a group (e.g. a
>>>>> URI
>>>>>>>>> parameter or unique path), so the various UA's naturally get
>>>>> served
>>>>>>>>> one of a fewer variations of the page from the cache.
>>>>>>>>>
>>>>>>>>> "direct involvement of the CT proxy in group selection" implies
>>>>>> some
>>>>>>>>> kind of metadata exchange between CP and CT proxy, through
>> which
>>>>>>>>> group-related pages can be indicated, and maybe a tighter
>>>>>>> integration
>>>>>>>>> of the CT proxy and cache. Both appear (to me) to be less
>>>>> desirable
>>>>>>> to
>>>>>>>>> standardize, and at least more complex to consider.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Bryan Sullivan | AT&T
>>>>>>>>>
>> -------------------------------------------------------------------
>>> -
>>>>>>> ----
>>>>>>>>> *From:* public-bpwg-ct-request@w3.org
>>>>>>>>> [mailto:public-bpwg-ct-request@w3.org] *On Behalf Of *Umesh
>>>>>> Sirsiwal
>>>>>>>>> *Sent:* Monday, May 19, 2008 8:12 AM
>>>>>>>>> *To:* public-bpwg-ct@w3.org
>>>>>>>>> *Subject:* CT Proxies and Forward Caches
>>>>>>>>>
>>>>>>>>> Several content transformation proxies and the Internet in
>>>> general
>>>>>>>>> includes forward caches. Current definition of HTTP includes
>>>>>>>>> indication of transformation using Vary header. In most cases
>>> the
>>>
>>>>>>>>> Content Transformation proxies and servers vary their responses
>>>>>>> based
>>>>>>>>> on User-Agent header. The number of User-Agent string in is
>> very
>>>>>>> high
>>>>>>>>> and caches cannot possibly store these mean copies of the
>>>>> response.
>>>>>>>>> Most servers are likely to classify the devices in certain
>> meta-
>>>>>>> groups
>>>>>>>>> for the purpose of content transformation. However, this meta-
>>>>> group
>>>>>>> is
>>>>>>>>> expected to be server specific. In absence of formal method,
>> the
>>>>>>>>> caches will be left to guess the meta-group. What will be the
>>>>>> method
>>>>>>>>> to solve this?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>

Received on Monday, 2 June 2008 09:54:16 UTC