Re: CT Proxies and Forward Caches

Right, probably never, but it still should...

I mean, "dynamic content" is different from "dynamically generated 
representations based on user-agents". In the former case, the Content 
Provider should use some "no-cache" directive, whereas in the latter 
case the Content Provider should use a "Vary: User-Agent", make sure the 
response can be cached, and should participate in the creation of a 
better world by helping caches save the bandwidth, which in practice 
means use the "Content-Location" header to help intermediary caches 
understand when two responses on two User-Agent are actually equal.

If we don't put a note, CPs will use a "Vary: User-Agent" header without 
even thinking that it impacts intermediary caches.
If we put a note, well, at least, readers will know there is something 
they should do, even if they don't care.

That's the reason why I suggest the note. More like a contextual ad: 
"Using the Vary header? Get 2 for the same price, adopt a lonely 
Content-Location one!" (suggested background music: "Heal the world, 
make it a better place...")

Francois.


Jo Rabin wrote:
> Thanks Francois, I guess I am wondering how often a server that offers a 
> varying response can respond with a Not Modified status, given that the 
> response would have been constructed by some dynamic process ...
> 
> Jo
> 
> On 02/06/2008 11:11, Francois Daoust wrote:
>> Well, I'm not a cache expert but I'd say that the following would happen:
>>
>> 1. the proxy receives a response with a Cache-Location: 
>> http://example.org/Representation1 header for a specific User-Agent
>> 2. it stores the response.
>> 3. another request is received for the same URI but with a different 
>> User-Agent.
>> 4. the proxy cannot match on the stored response, but can send a 
>> conditional request to the server.
>> 5. the server answers with the same representation response and a 304 
>> Not Modified status
>> 6. the proxy serves the response that it has in cache
>>
>> In short, it only saves a bit of bandwidth between the server and the 
>> proxy.
>>
>> If the conditional request is not possible for 4., then I totally 
>> agree that this is just plain useless...
>>
>> Francois.
>>
>>
>>
>> Jo Rabin wrote:
>>>
>>> I am a bit confused as to how/why this all works. It seems to me that 
>>> for this to actually work cache efficiently, the cache would have to 
>>> understand how _exactly_ the server processes the User Agent header.
>>>
>>> As pointed out earlier in this thread, there may be countless 
>>> variations on the same basic header that as far as the server is 
>>> concerned all represent near-enough the same thing. However, use of 
>>> content location and vary headers does not give any clue as to how it 
>>> makes that judgement.
>>>
>>> So it seems to me that a proxy, knowing that a server varies its 
>>> representations based on the UA header can legitimately cache _only_ 
>>> if the UA is _exactly_ the same. So what puzzles me is why a content 
>>> location header helps it. I think I must be missing the point here, 
>>> and if so, apologies.
>>>
>>> Jo
>>>
>>> On 02/06/2008 08:39, Francois Daoust wrote:
>>>> I agree as well...
>>>>
>>>> ... and I also agree with Jo that, in all cases, having the 
>>>> different representations available at specific locations means 
>>>> extra-work for the CP with no real added-value (save the fact that 
>>>> managing a clean list of the different representations available - 
>>>> tweaks included - eases testing), and is probably not a common 
>>>> practice.
>>>>
>>>> Francois.
>>>>
>>>> Umesh Sirsiwal wrote:
>>>>> I agree with Bryan. We may want to recommend (a) as the preferred
>>>>> solution as this saves the extra roundtrip.
>>>>>> -----Original Message-----
>>>>>> From: Sullivan, Bryan [mailto:BS3131@att.com]
>>>>>> Sent: Friday, May 30, 2008 12:25 PM
>>>>>> To: Francois Daoust; Umesh Sirsiwal
>>>>>> Cc: Jo Rabin; public-bpwg-ct@w3.org
>>>>>> Subject: RE: CT Proxies and Forward Caches
>>>>>>
>>>>>> Hi Francois,
>>>>>> With (b) as an option, do you think the proposal would result in a
>>>>>> greater number of redirects? I can see a case where a CP wants to
>>>>>> normally provide only a generic URI, e.g. since this is embedded as
>>>>>> links in other resources. If the CP did not make the specific
>>>>>> representation available at a unique URI also, your proposal would
>>>>>> require that a redirect result for each request related to (or based
>>>>>> upon) the generic URI.
>>>>>>
>>>>>> It might be better just to say: "When varying representations 
>>>>>> based on
>>>>>> received HTTP headers, cache-efficient techniques should be used. For
>>>>>> example, if the total number of representations is limited whereas 
>>>>>> the
>>>>>> number of values for a HTTP header used for varying representation is
>>>>>> high [typically the case when varying representations based on the
>>>>>> User-Agent string], the different representations should be made
>>>>>> available at specific URIs and the request to the generic resource
>>>>>> should return the specific representation along with a
>>>>> Content-Location
>>>>>> header that identifies the representation being served."
>>>>>>
>>>>>> This would avoid the message to CP's that redirect to specific
>>>>>> representations (as compared to just returning them) is a recommended
>>>>>> practice, if they are somehow prevented from making the
>>>>> representations
>>>>>> available at specific URI's.
>>>>>>
>>>>>> Best regards,
>>>>>> Bryan Sullivan | AT&T
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Francois Daoust [mailto:fd@w3.org]
>>>>>> Sent: Friday, May 30, 2008 7:50 AM
>>>>>> To: Umesh Sirsiwal
>>>>>> Cc: Jo Rabin; Sullivan, Bryan; public-bpwg-ct@w3.org
>>>>>> Subject: Re: CT Proxies and Forward Caches
>>>>>>
>>>>>> Thanks for the clarification, Umesh.
>>>>>>
>>>>>> Very good point.
>>>>>>
>>>>>> The Content-Location header would probably have deserved a mention in
>>>>>> the TAG Finding I mentioned at the beginning of the thread and in
>>>>>> particular in 2.1.1 section [1], third item, since the Vary header
>>>>>> makes
>>>>>> things work, and the Content-Location header makes things
>>>>>> cache-friendly. It saves the redirection, and makes groups used by 
>>>>>> the
>>>>>> server available to caches without revealing how they were built.
>>>>>>
>>>>>> As far as content-transformation is concerned, there may not be much
>>>>> to
>>>>>> say though as it's a rather generic caching issue. The need to use a
>>>>>> "Vary" on the "User-Agent" header is yet typical of the Mobile world,
>>>>>> so
>>>>>> we probably should emphasize this point somewhere. I'm not sure the
>>>>>> Content Transformation guidelines document is the right place for it,
>>>>>> but since Content-Location sounds like a "natural" companion for the
>>>>>> Vary header, we could add a note, next to the guideline that says 
>>>>>> that
>>>>>> the server MUST add a "Vary" HTTP header when varying 
>>>>>> representations,
>>>>>> along the lines of:
>>>>>>
>>>>>> "When varying representations based on received HTTP headers,
>>>>>> cache-efficient techniques should be used. For example, if the total
>>>>>> number of representations is limited whereas the number of values for
>>>>> a
>>>>>> HTTP header used for varying representation is high [typically the
>>>>> case
>>>>>> when varying representations based on the User-Agent string], the
>>>>>> different representations should be made available at specific URIs
>>>>>> and:
>>>>>> a) the request to the generic resource should return the specific
>>>>>> representation along with a Content-Location header that identifies
>>>>> the
>>>>>> representation being served.
>>>>>> or b) the request to the generic resource should return a redirection
>>>>>> to
>>>>>> the specific representation."
>>>>>>
>>>>>> Any other view on that?
>>>>>>
>>>>>> Francois
>>>>>>
>>>>>>
>>>>>> [1] http://www.w3.org/2001/tag/doc/alternatives-
>>>>>> discovery.html#id2261787
>>>>>>
>>>>>>
>>>>>>
>>>>>> Umesh Sirsiwal wrote:
>>>>>>> Hi Fancois,
>>>>>>> Sorry for the confusion. Based on my understanding of the Link
>>>>>>> element, I can further clarify difference between the Link element
>>>>>> and
>>>>>>
>>>>>>> the Presentation-URI.
>>>>>>>
>>>>>>> My understanding is that the Link header provides a method of
>>>>>>> advertising available alternatives for the page being served. On the
>>>>>>> other hand the Presentation-URI provides a method to identify the
>>>>>>> alternative included in the response. In case of the deployment case
>>>>>>> you mentioned below, once the CT proxy has identified the page to be
>>>>>>> served it will include a Presentation-URI header identifying the
>>>>>> selected URI.
>>>>>>> Using this the Vary header will be able to identify the criteria on
>>>>>>> which the server varied its response, while the Presentation-URI
>>>>> will
>>>>>>> be able to identify which of the several alternatives was served.
>>>>>>>
>>>>>>> Rereading HTTP specification, the Presentation-URI is the same as
>>>>>>> Content-Location header field. I am proposing that the CP or the CT
>>>>>>> proxy which can serve multiple presentation of the content for the
>>>>>>> same URI, should include Content-Location header to identify the
>>>>>>> entity it is serving.
>>>>>>>
>>>>>>> -Umesh
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Francois Daoust [mailto:fd@w3.org]
>>>>>>>> Sent: Monday, May 26, 2008 11:31 AM
>>>>>>>> To: Umesh Sirsiwal
>>>>>>>> Cc: Jo Rabin; Sullivan, Bryan; public-bpwg-ct@w3.org
>>>>>>>> Subject: Re: CT Proxies and Forward Caches
>>>>>>>>
>>>>>>>> Hi Umesh,
>>>>>>>>
>>>>>>>> I'm not sure I completely follow your point here, feel free to
>>>>>>>> correct me.
>>>>>>>>
>>>>>>>> The Presentation-URI header you mention to identify alternative
>>>>>>>> representations being served looks like the "Link" element we're
>>>>>>>> currently discussing in another thread, see:
>>>>>>>>
>>>>>>>>
>>>>> http://lists.w3.org/Archives/Public/public-bpwg-ct/2008May/0021.html
>>>>>>>> and replies.
>>>>>>>>
>>>>>>>> In the case of the Link element, we're currently trying to see when
>>>>>>>> it makes sense to use it, and how it could be used in practice.
>>>>> This
>>>>>>> would
>>>>>>>> indeed avoid the extra round trip in the sense that the CT-proxy
>>>>>>>> would be able to do the redirection for the user and so the
>>>>>>>> "redirect" would not reach the high-latency network the end-user is
>>>>>> connected to.
>>>>>>>> Now, obviously, the problem with the "Link" element is that it is
>>>>> at
>>>>>>>> the markup level, and not at the HTTP level. It would be cool to
>>>>>> have
>>>>>>
>>>>>>>> a "Link" HTTP header, typically for images and more generally for
>>>>>> all
>>>>>>
>>>>>>>> non-HTML content. We're not the only ones who want the "Link"
>>>>> header
>>>>>>>> back to life ("back" since it previously existed but disappeared
>>>>> for
>>>>>>>> lack of use, how ironic ;-)), and there are many on-going
>>>>>> discussions
>>>>>>
>>>>>>>> within W3C and IETF about that. If it ever becomes a reality, it
>>>>>>>> would indeed be useful to serve multiple representations of a
>>>>>> resource.
>>>>>>>> Note that it's not directly related to content transformation in
>>>>>>>> itself.
>>>>>>>> The presence of a content transformation proxy merely adds to the
>>>>>>> case.
>>>>>>>> Did I get you right?
>>>>>>>>
>>>>>>>> Francois.
>>>>>>>>
>>>>>>>>
>>>>>>>> Umesh Sirsiwal wrote:
>>>>>>>>> Jo, Francois, Bryan,
>>>>>>>>> Thanks for the responses. IMO absence of standardization in this
>>>>>>>> space
>>>>>>>>> will cause caches built in CT or otherwise to implement heuristics
>>>>>>>> based
>>>>>>>>> solutions to deduce intent of CP or CT. That is less then
>>>>>> desirable.
>>>>>>>>> To avoid the extra round trip Francois pointed out, the CP can
>>>>>>>> possible
>>>>>>>>> serve an HTTP header (let us call it Presentation-URI) identifying
>>>>>>>>> alternative representation served. The CT proxy or other caches
>>>>>> will
>>>>>>
>>>>>>>>> need to pay attention to this new header. But, as long as Via
>>>>>> header
>>>>>>>> is
>>>>>>>>> always included, they will be able to correctly cache and serve
>>>>> the
>>>>>>>>> content.
>>>>>>>>>
>>>>>>>>> The Presentation-URI does not have to be limited to the three
>>>>>>> groups.
>>>>>>>> In
>>>>>>>>> some cases the Presentation-URI can be very specific and say
>>>>>>>> something
>>>>>>>>> like www.example.com/Device_a. Won't that work?
>>>>>>>>>
>>>>>>>>> -Umesh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Jo Rabin [mailto:jrabin@mtld.mobi]
>>>>>>>>>> Sent: Thursday, May 22, 2008 6:16 AM
>>>>>>>>>> To: Francois Daoust
>>>>>>>>>> Cc: Umesh Sirsiwal; Sullivan, Bryan; public-bpwg-ct@w3.org
>>>>>>>>>> Subject: Re: CT Proxies and Forward Caches
>>>>>>>>>>
>>>>>>>>>> Aside from the redirect cost that Francois mentions, I am not
>>>>> sure
>>>>>>>>> that
>>>>>>>>>> having separate URIs to allow caching of the "high" "medium" and
>>>>>>>> "low"
>>>>>>>>>> cases is the whole answer, since the response may still vary
>>>>>> within
>>>>>>
>>>>>>>>>> those groups depending on work-arounds to the quirks of any
>>>>>>>> particular
>>>>>>>>>> device within the grouping.
>>>>>>>>>>
>>>>>>>>>> As Francois points out, this relates to the "long-running" ISSUE-
>>>>>>>> 222,
>>>>>>>>>> and it's down to me to try to make sure that it doesn't run much
>>>>>>>>> longer
>>>>>>>>>> :-(
>>>>>>>>>>
>>>>>>>>>> Jo
>>>>>>>>>>
>>>>>>>>>> On 21/05/2008 09:34, Francois Daoust wrote:
>>>>>>>>>>> Indeed, the use of a "Vary: User-Agent" header generates much
>>>>>> more
>>>>>>
>>>>>>>>>>> entries than a more typical use of Vary such as "Vary: Accept-
>>>>>>>>>> Language",
>>>>>>>>>>> and is thus not a really cache-friendly directive.
>>>>>>>>>>>
>>>>>>>>>>> The solution Bryan suggested to create representation-specific
>>>>>>> URIs
>>>>>>>>>> for
>>>>>>>>>>> each UA group, coupled with a redirect response from a canonical
>>>>>>>>>>> representation is much better from a cache perspective but it
>>>>> has
>>>>>>> a
>>>>>>>>>>> cost: that of a round-trip between the server and the client to
>>>>>>>>> serve
>>>>>>>>>>> the redirect response to the representation-specific URI. This
>>>>>>>>>> solution
>>>>>>>>>>> is recommended by the W3C Technical Architecture Group in a
>>>>>>> finding
>>>>>>>>>> "On
>>>>>>>>>>> Linking Alternative Representations To Enable Discovery And
>>>>>>>>>> Publishing"
>>>>>>>>>>> [1].
>>>>>>>>>>>
>>>>>>>>>>> We only mention the use of the "Vary" header in current version
>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>> Content Transformation Guidelines document, but we have a long-
>>>>>>>>>> running
>>>>>>>>>>> discussion (internally named ISSUE-222) on the above mentioned
>>>>>> TAG
>>>>>>
>>>>>>>>>>> finding. We may include that possibility in the document as
>>>>> well.
>>>>>>>>>>> [1] http://www.w3.org/2001/tag/doc/alternatives-
>>>>>>>>>> discovery.html#id2261672
>>>>>>>>>>> Sullivan, Bryan wrote:
>>>>>>>>>>>> Hi Umesh,
>>>>>>>>>>>> As you mention, meta-group assignment (e.g. good/better/best)
>>>>> is
>>>>>>> a
>>>>>>>>>>>> deployment-specific function, i.e. one Content Provider (CP)
>>>>> may
>>>>>>>>>>>> choose a different set of groups and UA assignment as compared
>>>>>> to
>>>>>>
>>>>>>>>>>>> another. Without the direct involvement of the CT proxy in
>>>>> group
>>>>>>>>>>>> selection, the only way I see to reduce the cached
>>>>>>> representations
>>>>>>>>>> is
>>>>>>>>>>>> for the CP to provide a distinct URI to UA's in a group (e.g. a
>>>>>>>> URI
>>>>>>>>>>>> parameter or unique path), so the various UA's naturally get
>>>>>>>> served
>>>>>>>>>>>> one of a fewer variations of the page from the cache.
>>>>>>>>>>>>
>>>>>>>>>>>> "direct involvement of the CT proxy in group selection" implies
>>>>>>>>> some
>>>>>>>>>>>> kind of metadata exchange between CP and CT proxy, through
>>>>> which
>>>>>>>>>>>> group-related pages can be indicated, and maybe a tighter
>>>>>>>>>> integration
>>>>>>>>>>>> of the CT proxy and cache. Both appear (to me) to be less
>>>>>>>> desirable
>>>>>>>>>> to
>>>>>>>>>>>> standardize, and at least more complex to consider.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Bryan Sullivan | AT&T
>>>>>>>>>>>>
>>>>> -------------------------------------------------------------------
>>>>>> -
>>>>>>>>>> ----
>>>>>>>>>>>> *From:* public-bpwg-ct-request@w3.org
>>>>>>>>>>>> [mailto:public-bpwg-ct-request@w3.org] *On Behalf Of *Umesh
>>>>>>>>> Sirsiwal
>>>>>>>>>>>> *Sent:* Monday, May 19, 2008 8:12 AM
>>>>>>>>>>>> *To:* public-bpwg-ct@w3.org
>>>>>>>>>>>> *Subject:* CT Proxies and Forward Caches
>>>>>>>>>>>>
>>>>>>>>>>>> Several content transformation proxies and the Internet in
>>>>>>> general
>>>>>>>>>>>> includes forward caches. Current definition of HTTP includes
>>>>>>>>>>>> indication of transformation using Vary header. In most cases
>>>>>> the
>>>>>>
>>>>>>>>>>>> Content Transformation proxies and servers vary their responses
>>>>>>>>>> based
>>>>>>>>>>>> on User-Agent header. The number of User-Agent string in is
>>>>> very
>>>>>>>>>> high
>>>>>>>>>>>> and caches cannot possibly store these mean copies of the
>>>>>>>> response.
>>>>>>>>>>>> Most servers are likely to classify the devices in certain
>>>>> meta-
>>>>>>>>>> groups
>>>>>>>>>>>> for the purpose of content transformation. However, this meta-
>>>>>>>> group
>>>>>>>>>> is
>>>>>>>>>>>> expected to be server specific. In absence of formal method,
>>>>> the
>>>>>>>>>>>> caches will be left to guess the meta-group. What will be the
>>>>>>>>> method
>>>>>>>>>>>> to solve this?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>
>>>
>>>
> 

Received on Tuesday, 3 June 2008 09:42:35 UTC