ACTION 517 - caching of resources

Hi all, 

 

A quick progress update on the Caching of resources ACTION 517.
Apologies for the delay on this, I've been kept very busy with
ready.mobi.

 

My findings are that it is relatively simple to implement a basic
caching behaviour - but to get useful information into MOKI is a little
trickier. I wanted to run something by the group before committing it.

 

Currently in MOKI, images (to take one kind of resource) are recorded as
follows:

 

   <images>

      <image>

         <URI>testimage.gif</URI>

         <retrieval>

 
<retrievedURI>http://dev.mtld.mobi/testimage.gif</retrievedURI>

            <HTTPRequest>

               <rawHeaders>User-Agent: W3C-mobileOK/DDC-1.0 (see
http://www.w3.org/2006/07/mobileok-ddc)&#xD;

Accept:
application/xhtml+xml,text/html;q=0.1,application/vnd.wap.xhtml+xml;q=0.
1,text/css,image/jpeg,image/gif&#xD;

Accept-Charset: UTF-8&#xD;

Host: dev.mtld.mobi&#xD;

</rawHeaders>

               <method>GET</method>

               <URI>http://dev.mtld.mobi/testimage.gif</URI>

               <protocol>HTTP/1.1</protocol>

               <header name="user-agent"

                       value="W3C-mobileOK/DDC-1.0 (see
http://www.w3.org/2006/07/mobileok-ddc)"/>

               <header name="host" value="dev.mtld.mobi"/>

               <header name="accept">

                  <element name="application/xhtml+xml"/>

                  <element name="text/html">

                     <parameter name="q" value="0.1"/>

                  </element>

                  <element name="application/vnd.wap.xhtml+xml">

                     <parameter name="q" value="0.1"/>

                  </element>

                  <element name="text/css"/>

                  <element name="image/jpeg"/>

                  <element name="image/gif"/>

               </header>

               <header name="accept-charset">

                  <element name="utf-8"/>

               </header>

            </HTTPRequest>

            <HTTPResponse>

               <rawHeaders>Transfer-Encoding: chunked&#xD;

Date: Wed, 04 Jul 2007 09:24:31 GMT&#xD;

Server: Apache/2.2.2 (Unix) DAV/2 mod_ssl/2.2.2 OpenSSL/0.9.8b PHP/5.1.4
mod_apreq2-20051231/2.5.7 mod_perl/2.0.2 Perl/v5.8.7&#xD;

Last-Modified: Fri, 15 Jun 2007 17:27:50 GMT&#xD;

ETag: "dcd0-35f-2a9ab180"&#xD;

Accept-Ranges: bytes&#xD;

--------------: ---&#xD;

Cache-Control: max-age=604800&#xD;

Expires: Wed, 11 Jul 2007 09:24:31 GMT&#xD;

Content-Type: image/gif&#xD;

</rawHeaders>

               <protocol>HTTP/1.1</protocol>

               <status code="200" reason="OK"/>

               <header name="--------------" value="---"/>

               <header name="expires" value="Wed, 11 Jul 2007 09:24:31
GMT"/>

                <!-- etc -->

            </HTTPResponse>

         </retrieval>

         <imageInfo>

            <validity valid="true"/>

            <transparency transparent="false"/>

            <actualDimensions height="19" width="19"/>

         </imageInfo>

      </image>

   </images>

 

 

 

In a previous email, Abel has proposed the following structure:

 

<images> <!-- this includes objects-which-are-images, and also includes
stylesheet background images and list images -->

       <image>

              <reference> 

                     <location ref="#pd1" type="line">20</location><!--
where it is referenced - but in this case we probably need an XPATH???
as well??? -->

                     <URI>image1.jpg</URI> <!-- the URI as quoted in src
attribute etc -->

              </reference>

                     

 

              <retrieval id="img1">

 
<retrievedURI>http://w3.org/image1.jpg</retrievedURI>

                     <HTTPRequest connection="conn1" id="request4">

                           <stuff>...</stuff>

                     </HTTPRequest>

                     <HTTPResponse/>

                     <entity size="345" encoding="base64"/>

              </retrieval>

              <imageInfo>

                     <validity valid="true"/>

                     <transparency transparent="false"/>

                     <actualDimensions width="20" height="20"/>

                     <!-- CTIC:need to register these dimensions for
each  equal image-->

                     <statedDimensions width="20" height="20"/>

                     <!-- actually need to think about cached images and
also for retrieval -->

                           

              </imageInfo>

       </image>

       <!--CTIC:A cached image... -->

       <image>

              <reference> 

                     <location ref="#pd1" type="line">40</location><!--
where it is referenced - but in this case we probably need an XPATH???
as well??? -->

                     <URI>image1.jpg</URI> <!-- the URI as quoted in src
attribute etc -->

              </reference>

              <!--CTIC:If there are two or more occurrences orf the same
image we can reference the retrieval block.It is supposed a caching
behaviour -->

              <retrieval  ref="#img1"/>

              <!-- CTIC:Perhaps we can descompose this block?-->     

              <imageInfo>

                     <validity valid="true"/>

                     <transparency transparent="false"/>

                     <actualDimensions width="20" height="20"/>

                     <!-- CTIC:need to register these dimensions for
each  equal image-->

                     <statedDimensions width="20" height="20"/>

              <!-- actually need to think about cached images and also
for retrieval --> 

              </imageInfo>

       </image>

              

       <image/> <!-- etc. -->

</images>

 

 

Getting the location of a resource reference into MOKI is a little
trickier the way things currently stand in the framework. We can use
DOMUtils.getLineNumber() to get the line number when we identify
resources (in HTTPXHTMLResource.extractImages(), but when we build up a
list of the resources, they are returned as a list of URIs, with no
other information. This URI list is then consumed in the
Preprocessor.preprocess method, and HTTPResource objects are created and
downloading occurs. 

 

Changes I have implemented modify this process so that instead of
returning a list of URIs, a list of ResourceReference objects is
returned. A ResourceReference object contains the URI (as mentioned in
source), and also contains the line number of the occurrence. 

A ResourceCache object is instantiated in the Preprocessor.process()
method, and this maintains a list of the CachedResources. We check each
reference against this cache, and only create an HTTPresource if it is
unique, thus preventing multiple downloads of the same resource. If it
is not unique then we append this reference to a list of references of
this CachedResource. 

Next, we pass the Cache into the PreprocessorResults object with a new
method setCache(). The main purpose of this is so that we can access the
reference locations as we build the MOKI document. When we are building
the MOKI we can now perform a lookup for any references for each
HTTPImageResource that we record

 

So in MOKI an image resource and all references to it would be recorded
something like this:

 

 

<images> 

       <image>

              <reference> 

                     <location ref="#pd1" type="line">1</location>

                     <URI>image1.jpg</URI> <!-- the URI as quoted in src
attribute etc -->

              </reference>

                     

              <reference> 

                     <location ref="#pd1" type="line">2</location>

                     <URI>image1.jpg</URI> <!-- the URI as quoted in src
attribute etc -->

              </reference>

                     

              <reference> 

                     <location ref="#pd1" type="line">3</location>

                     <URI>image1.jpg</URI> <!-- the URI as quoted in src
attribute etc -->

              </reference>               

                     

              <reference> 

                     <location ref="#pd1" type="line">4</location>

                     <URI>image1.jpg</URI> <!-- the URI as quoted in src
attribute etc -->

              </reference>  

                     

                     

              <retrieval id="img1">

 
<retrievedURI>http://w3.org/image1.jpg</retrievedURI>

                     <HTTPRequest connection="conn1" id="request4">

                           <stuff>...</stuff>

                     </HTTPRequest>

                     <HTTPResponse/>

                     <entity size="345" encoding="base64"/>

              </retrieval>

              <imageInfo>

                     <validity valid="true"/>

                     <transparency transparent="false"/>

                     <actualDimensions width="20" height="20"/>

                     <!-- CTIC:need to register these dimensions for
each  equal image-->

                     <statedDimensions width="20" height="20"/>

                     <!-- actually need to think about cached images and
also for retrieval -->

                           

              </imageInfo>

       </image>

 

Thus we only have one image element per unique image resource, and we
list all references to this resource under the same image element.

 

Any thoughts before I commit these changes?

Ruadhan

Received on Wednesday, 4 July 2007 11:33:25 UTC