Re: ACTION 517 - caching of resources

It sounds like a fine solution -- I'm trying to think if there is
anything simpler we can do.

The purpose is to avoid, say, downloading a small image twenty times
from a site when it's referenced twenty times right? Can we merely do
this by removing duplicates among the links that are extracted? that
is, if foo.gif is mentioned twenty times, count it as one extracted
link?

I may be missing something, but if that's feasible it's a lot simpler.

Sean

On 7/4/07, Ruadhan O'Donoghue <rodonoghue@mtld.mobi> wrote:
>
>
>
>
> Hi all,
>
>
>
> A quick progress update on the Caching of resources ACTION 517. Apologies
> for the delay on this, I've been kept very busy with ready.mobi.
>
>
>
> My findings are that it is relatively simple to implement a basic caching
> behaviour – but to get useful information into MOKI is a little trickier. I
> wanted to run something by the group before committing it.
>
>
>
> Currently in MOKI, images (to take one kind of resource) are recorded as
> follows:
>
>
>
>    <images>
>
>       <image>
>
>          <URI>testimage.gif</URI>
>
>          <retrieval>
>
>
> <retrievedURI>http://dev.mtld.mobi/testimage.gif</retrievedURI>
>
>             <HTTPRequest>
>
>                <rawHeaders>User-Agent: W3C-mobileOK/DDC-1.0 (see
> http://www.w3.org/2006/07/mobileok-ddc)&#xD;
>
> Accept:
> application/xhtml+xml,text/html;q=0.1,application/vnd.wap.xhtml+xml;q=0.1,text/css,image/jpeg,image/gif&#xD;
>
> Accept-Charset: UTF-8&#xD;
>
> Host: dev.mtld.mobi&#xD;
>
> </rawHeaders>
>
>                <method>GET</method>
>
>
> <URI>http://dev.mtld.mobi/testimage.gif</URI>
>
>                <protocol>HTTP/1.1</protocol>
>
>                <header name="user-agent"
>
>                        value="W3C-mobileOK/DDC-1.0 (see
> http://www.w3.org/2006/07/mobileok-ddc)"/>
>
>                <header name="host" value="dev.mtld.mobi"/>
>
>                <header name="accept">
>
>                   <element name="application/xhtml+xml"/>
>
>                   <element name="text/html">
>
>                      <parameter name="q" value="0.1"/>
>
>                   </element>
>
>                   <element
> name="application/vnd.wap.xhtml+xml">
>
>                      <parameter name="q" value="0.1"/>
>
>                   </element>
>
>                   <element name="text/css"/>
>
>                   <element name="image/jpeg"/>
>
>                   <element name="image/gif"/>
>
>                </header>
>
>                <header name="accept-charset">
>
>                   <element name="utf-8"/>
>
>                </header>
>
>             </HTTPRequest>
>
>             <HTTPResponse>
>
>                <rawHeaders>Transfer-Encoding: chunked&#xD;
>
> Date: Wed, 04 Jul 2007 09:24:31 GMT&#xD;
>
> Server: Apache/2.2.2 (Unix) DAV/2 mod_ssl/2.2.2 OpenSSL/0.9.8b PHP/5.1.4
> mod_apreq2-20051231/2.5.7 mod_perl/2.0.2 Perl/v5.8.7&#xD;
>
> Last-Modified: Fri, 15 Jun 2007 17:27:50 GMT&#xD;
>
> ETag: "dcd0-35f-2a9ab180"&#xD;
>
> Accept-Ranges: bytes&#xD;
>
> --------------: ---&#xD;
>
> Cache-Control: max-age=604800&#xD;
>
> Expires: Wed, 11 Jul 2007 09:24:31 GMT&#xD;
>
> Content-Type: image/gif&#xD;
>
> </rawHeaders>
>
>                <protocol>HTTP/1.1</protocol>
>
>                <status code="200" reason="OK"/>
>
>                <header name="--------------" value="---"/>
>
>                <header name="expires" value="Wed, 11 Jul 2007 09:24:31
> GMT"/>
>
>                 <!-- etc -->
>
>             </HTTPResponse>
>
>          </retrieval>
>
>          <imageInfo>
>
>             <validity valid="true"/>
>
>             <transparency transparent="false"/>
>
>             <actualDimensions height="19" width="19"/>
>
>          </imageInfo>
>
>       </image>
>
>    </images>
>
>
>
>
>
>
>
> In a previous email, Abel has proposed the following structure:
>
>
>
> <images> <!-- this includes objects-which-are-images, and also includes
> stylesheet background images and list images -->
>
>        <image>
>
>               <reference>
>
>                      <location ref="#pd1" type="line">20</location><!--
> where it is referenced - but in this case we probably need an XPATH??? as
> well??? -->
>
>                      <URI>image1.jpg</URI> <!-- the URI as quoted in src
> attribute etc -->
>
>               </reference>
>
>
>
>
>
>               <retrieval id="img1">
>
>                      <retrievedURI>http://w3.org/image1.jpg</retrievedURI>
>
>                      <HTTPRequest connection="conn1" id="request4">
>
>                            <stuff>...</stuff>
>
>                      </HTTPRequest>
>
>                      <HTTPResponse/>
>
>                      <entity size="345" encoding="base64"/>
>
>               </retrieval>
>
>               <imageInfo>
>
>                      <validity valid="true"/>
>
>                      <transparency transparent="false"/>
>
>                      <actualDimensions width="20" height="20"/>
>
>                      <!-- CTIC:need to register these dimensions for each
> equal image-->
>
>                      <statedDimensions width="20" height="20"/>
>
>                      <!-- actually need to think about cached images and
> also for retrieval -->
>
>
>
>               </imageInfo>
>
>        </image>
>
>        <!--CTIC:A cached image... -->
>
>        <image>
>
>               <reference>
>
>                      <location ref="#pd1" type="line">40</location><!--
> where it is referenced - but in this case we probably need an XPATH??? as
> well??? -->
>
>                      <URI>image1.jpg</URI> <!-- the URI as quoted in src
> attribute etc -->
>
>               </reference>
>
>               <!--CTIC:If there are two or more occurrences orf the same
> image we can reference the retrieval block.It is supposed a caching
> behaviour -->
>
>               <retrieval  ref="#img1"/>
>
>               <!-- CTIC:Perhaps we can descompose this block?-->
>
>               <imageInfo>
>
>                      <validity valid="true"/>
>
>                      <transparency transparent="false"/>
>
>                      <actualDimensions width="20" height="20"/>
>
>                      <!-- CTIC:need to register these dimensions for each
> equal image-->
>
>                      <statedDimensions width="20" height="20"/>
>
>               <!-- actually need to think about cached images and also for
> retrieval -->
>
>               </imageInfo>
>
>        </image>
>
>
>
>        <image/> <!-- etc. -->
>
> </images>
>
>
>
>
>
> Getting the location of a resource reference into MOKI is a little trickier
> the way things currently stand in the framework. We can use
> DOMUtils.getLineNumber() to get the line number when we identify resources
> (in HTTPXHTMLResource.extractImages(), but when we build up a list of the
> resources, they are returned as a list of URIs, with no other information.
> This URI list is then consumed in the Preprocessor.preprocess method, and
> HTTPResource objects are created and downloading occurs.
>
>
>
> Changes I have implemented modify this process so that instead of returning
> a list of URIs, a list of ResourceReference objects is returned. A
> ResourceReference object contains the URI (as mentioned in source), and also
> contains the line number of the occurrence.
>
> A ResourceCache object is instantiated in the Preprocessor.process() method,
> and this maintains a list of the CachedResources. We check each reference
> against this cache, and only create an HTTPresource if it is unique, thus
> preventing multiple downloads of the same resource. If it is not unique then
> we append this reference to a list of references of this CachedResource.
>
> Next, we pass the Cache into the PreprocessorResults object with a new
> method setCache(). The main purpose of this is so that we can access the
> reference locations as we build the MOKI document. When we are building the
> MOKI we can now perform a lookup for any references for each
> HTTPImageResource that we record
>
>
>
> So in MOKI an image resource and all references to it would be recorded
> something like this:
>
>
>
>
>
> <images>
>
>        <image>
>
>               <reference>
>
>                      <location ref="#pd1" type="line">1</location>
>
>                      <URI>image1.jpg</URI> <!-- the URI as quoted in src
> attribute etc -->
>
>               </reference>
>
>
>
>               <reference>
>
>                      <location ref="#pd1" type="line">2</location>
>
>                      <URI>image1.jpg</URI> <!-- the URI as quoted in src
> attribute etc -->
>
>               </reference>
>
>
>
>               <reference>
>
>                      <location ref="#pd1" type="line">3</location>
>
>                      <URI>image1.jpg</URI> <!-- the URI as quoted in src
> attribute etc -->
>
>               </reference>
>
>
>
>               <reference>
>
>                      <location ref="#pd1" type="line">4</location>
>
>                      <URI>image1.jpg</URI> <!-- the URI as quoted in src
> attribute etc -->
>
>               </reference>
>
>
>
>
>
>               <retrieval id="img1">
>
>                      <retrievedURI>http://w3.org/image1.jpg</retrievedURI>
>
>                      <HTTPRequest connection="conn1" id="request4">
>
>                            <stuff>...</stuff>
>
>                      </HTTPRequest>
>
>                      <HTTPResponse/>
>
>                      <entity size="345" encoding="base64"/>
>
>               </retrieval>
>
>               <imageInfo>
>
>                      <validity valid="true"/>
>
>                      <transparency transparent="false"/>
>
>                      <actualDimensions width="20" height="20"/>
>
>                      <!-- CTIC:need to register these dimensions for each
> equal image-->
>
>                      <statedDimensions width="20" height="20"/>
>
>                      <!-- actually need to think about cached images and
> also for retrieval -->
>
>
>
>               </imageInfo>
>
>        </image>
>
>
>
> Thus we only have one image element per unique image resource, and we list
> all references to this resource under the same image element.
>
>
>
> Any thoughts before I commit these changes?
>
> Ruadhan

Received on Thursday, 5 July 2007 19:33:55 UTC