RE: ACTION 517 - caching of resources from Ruadhan O'Donoghue on 2007-07-06 (public-mobileok-checker@w3.org from July 2007)

From: Ruadhan O'Donoghue <rodonoghue@mtld.mobi>
Date: Fri, 6 Jul 2007 09:44:30 +0100
To: "Sean Owen" <srowen@google.com>
Cc: <public-mobileok-checker@w3.org>
Message-ID: <C8FFD98530207F40BD8D2CAD608B50B44834C5@mtldsvr01.DotMobi.local>
That seems like a sensible thing to do, but we run into problems when we
want to record the position information of each reference to a unique
resource into moki. If we return just a list of unique resources, then
it means that we have to go back to the original document again at some
point to extract the position information, which is inefficient. 

If we are happy not to include the position information of each
reference to a resource, then sure, we can do it this way. But then we
will know all the unique resources in a document, but not the number of
times, or where, they occurred in that document.

Ruadhan

> -----Original Message-----
> From: Sean Owen [mailto:srowen@google.com]
> Sent: 05 July 2007 20:33
> To: Ruadhan O'Donoghue
> Cc: public-mobileok-checker@w3.org
> Subject: Re: ACTION 517 - caching of resources
> 
> It sounds like a fine solution -- I'm trying to think if there is
> anything simpler we can do.
> 
> The purpose is to avoid, say, downloading a small image twenty times
> from a site when it's referenced twenty times right? Can we merely do
> this by removing duplicates among the links that are extracted? that
> is, if foo.gif is mentioned twenty times, count it as one extracted
> link?
> 
> I may be missing something, but if that's feasible it's a lot simpler.
> 
> Sean
> 
> On 7/4/07, Ruadhan O'Donoghue <rodonoghue@mtld.mobi> wrote:
> >
> >
> >
> >
> > Hi all,
> >
> >
> >
> > A quick progress update on the Caching of resources ACTION 517.
> Apologies
> > for the delay on this, I've been kept very busy with ready.mobi.
> >
> >
> >
> > My findings are that it is relatively simple to implement a basic
> caching
> > behaviour - but to get useful information into MOKI is a little
> trickier. I
> > wanted to run something by the group before committing it.
> >
> >
> >
> > Currently in MOKI, images (to take one kind of resource) are
recorded as
> > follows:
> >
> >
> >
> >    <images>
> >
> >       <image>
> >
> >          <URI>testimage.gif</URI>
> >
> >          <retrieval>
> >
> >
> > <retrievedURI>http://dev.mtld.mobi/testimage.gif</retrievedURI>
> >
> >             <HTTPRequest>
> >
> >                <rawHeaders>User-Agent: W3C-mobileOK/DDC-1.0 (see
> > http://www.w3.org/2006/07/mobileok-ddc)&#xD;
> >
> > Accept:
> >
>
application/xhtml+xml,text/html;q=0.1,application/vnd.wap.xhtml+xml;q=0.
1,
> text/css,image/jpeg,image/gif&#xD;
> >
> > Accept-Charset: UTF-8&#xD;
> >
> > Host: dev.mtld.mobi&#xD;
> >
> > </rawHeaders>
> >
> >                <method>GET</method>
> >
> >
> > <URI>http://dev.mtld.mobi/testimage.gif</URI>
> >
> >                <protocol>HTTP/1.1</protocol>
> >
> >                <header name="user-agent"
> >
> >                        value="W3C-mobileOK/DDC-1.0 (see
> > http://www.w3.org/2006/07/mobileok-ddc)"/>
> >
> >                <header name="host" value="dev.mtld.mobi"/>
> >
> >                <header name="accept">
> >
> >                   <element name="application/xhtml+xml"/>
> >
> >                   <element name="text/html">
> >
> >                      <parameter name="q" value="0.1"/>
> >
> >                   </element>
> >
> >                   <element
> > name="application/vnd.wap.xhtml+xml">
> >
> >                      <parameter name="q" value="0.1"/>
> >
> >                   </element>
> >
> >                   <element name="text/css"/>
> >
> >                   <element name="image/jpeg"/>
> >
> >                   <element name="image/gif"/>
> >
> >                </header>
> >
> >                <header name="accept-charset">
> >
> >                   <element name="utf-8"/>
> >
> >                </header>
> >
> >             </HTTPRequest>
> >
> >             <HTTPResponse>
> >
> >                <rawHeaders>Transfer-Encoding: chunked&#xD;
> >
> > Date: Wed, 04 Jul 2007 09:24:31 GMT&#xD;
> >
> > Server: Apache/2.2.2 (Unix) DAV/2 mod_ssl/2.2.2 OpenSSL/0.9.8b
PHP/5.1.4
> > mod_apreq2-20051231/2.5.7 mod_perl/2.0.2 Perl/v5.8.7&#xD;
> >
> > Last-Modified: Fri, 15 Jun 2007 17:27:50 GMT&#xD;
> >
> > ETag: "dcd0-35f-2a9ab180"&#xD;
> >
> > Accept-Ranges: bytes&#xD;
> >
> > --------------: ---&#xD;
> >
> > Cache-Control: max-age=604800&#xD;
> >
> > Expires: Wed, 11 Jul 2007 09:24:31 GMT&#xD;
> >
> > Content-Type: image/gif&#xD;
> >
> > </rawHeaders>
> >
> >                <protocol>HTTP/1.1</protocol>
> >
> >                <status code="200" reason="OK"/>
> >
> >                <header name="--------------" value="---"/>
> >
> >                <header name="expires" value="Wed, 11 Jul 2007
09:24:31
> > GMT"/>
> >
> >                 <!-- etc -->
> >
> >             </HTTPResponse>
> >
> >          </retrieval>
> >
> >          <imageInfo>
> >
> >             <validity valid="true"/>
> >
> >             <transparency transparent="false"/>
> >
> >             <actualDimensions height="19" width="19"/>
> >
> >          </imageInfo>
> >
> >       </image>
> >
> >    </images>
> >
> >
> >
> >
> >
> >
> >
> > In a previous email, Abel has proposed the following structure:
> >
> >
> >
> > <images> <!-- this includes objects-which-are-images, and also
includes
> > stylesheet background images and list images -->
> >
> >        <image>
> >
> >               <reference>
> >
> >                      <location ref="#pd1"
type="line">20</location><!--
> > where it is referenced - but in this case we probably need an
XPATH???
> as
> > well??? -->
> >
> >                      <URI>image1.jpg</URI> <!-- the URI as quoted in
src
> > attribute etc -->
> >
> >               </reference>
> >
> >
> >
> >
> >
> >               <retrieval id="img1">
> >
> >
> <retrievedURI>http://w3.org/image1.jpg</retrievedURI>
> >
> >                      <HTTPRequest connection="conn1" id="request4">
> >
> >                            <stuff>...</stuff>
> >
> >                      </HTTPRequest>
> >
> >                      <HTTPResponse/>
> >
> >                      <entity size="345" encoding="base64"/>
> >
> >               </retrieval>
> >
> >               <imageInfo>
> >
> >                      <validity valid="true"/>
> >
> >                      <transparency transparent="false"/>
> >
> >                      <actualDimensions width="20" height="20"/>
> >
> >                      <!-- CTIC:need to register these dimensions for
> each
> > equal image-->
> >
> >                      <statedDimensions width="20" height="20"/>
> >
> >                      <!-- actually need to think about cached images
and
> > also for retrieval -->
> >
> >
> >
> >               </imageInfo>
> >
> >        </image>
> >
> >        <!--CTIC:A cached image... -->
> >
> >        <image>
> >
> >               <reference>
> >
> >                      <location ref="#pd1"
type="line">40</location><!--
> > where it is referenced - but in this case we probably need an
XPATH???
> as
> > well??? -->
> >
> >                      <URI>image1.jpg</URI> <!-- the URI as quoted in
src
> > attribute etc -->
> >
> >               </reference>
> >
> >               <!--CTIC:If there are two or more occurrences orf the
same
> > image we can reference the retrieval block.It is supposed a caching
> > behaviour -->
> >
> >               <retrieval  ref="#img1"/>
> >
> >               <!-- CTIC:Perhaps we can descompose this block?-->
> >
> >               <imageInfo>
> >
> >                      <validity valid="true"/>
> >
> >                      <transparency transparent="false"/>
> >
> >                      <actualDimensions width="20" height="20"/>
> >
> >                      <!-- CTIC:need to register these dimensions for
> each
> > equal image-->
> >
> >                      <statedDimensions width="20" height="20"/>
> >
> >               <!-- actually need to think about cached images and
also
> for
> > retrieval -->
> >
> >               </imageInfo>
> >
> >        </image>
> >
> >
> >
> >        <image/> <!-- etc. -->
> >
> > </images>
> >
> >
> >
> >
> >
> > Getting the location of a resource reference into MOKI is a little
> trickier
> > the way things currently stand in the framework. We can use
> > DOMUtils.getLineNumber() to get the line number when we identify
> resources
> > (in HTTPXHTMLResource.extractImages(), but when we build up a list
of
> the
> > resources, they are returned as a list of URIs, with no other
> information.
> > This URI list is then consumed in the Preprocessor.preprocess
method,
> and
> > HTTPResource objects are created and downloading occurs.
> >
> >
> >
> > Changes I have implemented modify this process so that instead of
> returning
> > a list of URIs, a list of ResourceReference objects is returned. A
> > ResourceReference object contains the URI (as mentioned in source),
and
> also
> > contains the line number of the occurrence.
> >
> > A ResourceCache object is instantiated in the Preprocessor.process()
> method,
> > and this maintains a list of the CachedResources. We check each
> reference
> > against this cache, and only create an HTTPresource if it is unique,
> thus
> > preventing multiple downloads of the same resource. If it is not
unique
> then
> > we append this reference to a list of references of this
CachedResource.
> >
> > Next, we pass the Cache into the PreprocessorResults object with a
new
> > method setCache(). The main purpose of this is so that we can access
the
> > reference locations as we build the MOKI document. When we are
building
> the
> > MOKI we can now perform a lookup for any references for each
> > HTTPImageResource that we record
> >
> >
> >
> > So in MOKI an image resource and all references to it would be
recorded
> > something like this:
> >
> >
> >
> >
> >
> > <images>
> >
> >        <image>
> >
> >               <reference>
> >
> >                      <location ref="#pd1" type="line">1</location>
> >
> >                      <URI>image1.jpg</URI> <!-- the URI as quoted in
src
> > attribute etc -->
> >
> >               </reference>
> >
> >
> >
> >               <reference>
> >
> >                      <location ref="#pd1" type="line">2</location>
> >
> >                      <URI>image1.jpg</URI> <!-- the URI as quoted in
src
> > attribute etc -->
> >
> >               </reference>
> >
> >
> >
> >               <reference>
> >
> >                      <location ref="#pd1" type="line">3</location>
> >
> >                      <URI>image1.jpg</URI> <!-- the URI as quoted in
src
> > attribute etc -->
> >
> >               </reference>
> >
> >
> >
> >               <reference>
> >
> >                      <location ref="#pd1" type="line">4</location>
> >
> >                      <URI>image1.jpg</URI> <!-- the URI as quoted in
src
> > attribute etc -->
> >
> >               </reference>
> >
> >
> >
> >
> >
> >               <retrieval id="img1">
> >
> >
> <retrievedURI>http://w3.org/image1.jpg</retrievedURI>
> >
> >                      <HTTPRequest connection="conn1" id="request4">
> >
> >                            <stuff>...</stuff>
> >
> >                      </HTTPRequest>
> >
> >                      <HTTPResponse/>
> >
> >                      <entity size="345" encoding="base64"/>
> >
> >               </retrieval>
> >
> >               <imageInfo>
> >
> >                      <validity valid="true"/>
> >
> >                      <transparency transparent="false"/>
> >
> >                      <actualDimensions width="20" height="20"/>
> >
> >                      <!-- CTIC:need to register these dimensions for
> each
> > equal image-->
> >
> >                      <statedDimensions width="20" height="20"/>
> >
> >                      <!-- actually need to think about cached images
and
> > also for retrieval -->
> >
> >
> >
> >               </imageInfo>
> >
> >        </image>
> >
> >
> >
> > Thus we only have one image element per unique image resource, and
we
> list
> > all references to this resource under the same image element.
> >
> >
> >
> > Any thoughts before I commit these changes?
> >
> > Ruadhan
Received on Friday, 6 July 2007 08:45:55 UTC