- From: Ruadhan O'Donoghue <rodonoghue@mtld.mobi>
- Date: Fri, 6 Jul 2007 09:44:30 +0100
- To: "Sean Owen" <srowen@google.com>
- Cc: <public-mobileok-checker@w3.org>
That seems like a sensible thing to do, but we run into problems when we want to record the position information of each reference to a unique resource into moki. If we return just a list of unique resources, then it means that we have to go back to the original document again at some point to extract the position information, which is inefficient. If we are happy not to include the position information of each reference to a resource, then sure, we can do it this way. But then we will know all the unique resources in a document, but not the number of times, or where, they occurred in that document. Ruadhan > -----Original Message----- > From: Sean Owen [mailto:srowen@google.com] > Sent: 05 July 2007 20:33 > To: Ruadhan O'Donoghue > Cc: public-mobileok-checker@w3.org > Subject: Re: ACTION 517 - caching of resources > > It sounds like a fine solution -- I'm trying to think if there is > anything simpler we can do. > > The purpose is to avoid, say, downloading a small image twenty times > from a site when it's referenced twenty times right? Can we merely do > this by removing duplicates among the links that are extracted? that > is, if foo.gif is mentioned twenty times, count it as one extracted > link? > > I may be missing something, but if that's feasible it's a lot simpler. > > Sean > > On 7/4/07, Ruadhan O'Donoghue <rodonoghue@mtld.mobi> wrote: > > > > > > > > > > Hi all, > > > > > > > > A quick progress update on the Caching of resources ACTION 517. > Apologies > > for the delay on this, I've been kept very busy with ready.mobi. > > > > > > > > My findings are that it is relatively simple to implement a basic > caching > > behaviour - but to get useful information into MOKI is a little > trickier. I > > wanted to run something by the group before committing it. > > > > > > > > Currently in MOKI, images (to take one kind of resource) are recorded as > > follows: > > > > > > > > <images> > > > > <image> > > > > <URI>testimage.gif</URI> > > > > <retrieval> > > > > > > <retrievedURI>http://dev.mtld.mobi/testimage.gif</retrievedURI> > > > > <HTTPRequest> > > > > <rawHeaders>User-Agent: W3C-mobileOK/DDC-1.0 (see > > http://www.w3.org/2006/07/mobileok-ddc)
 > > > > Accept: > > > application/xhtml+xml,text/html;q=0.1,application/vnd.wap.xhtml+xml;q=0. 1, > text/css,image/jpeg,image/gif
 > > > > Accept-Charset: UTF-8
 > > > > Host: dev.mtld.mobi
 > > > > </rawHeaders> > > > > <method>GET</method> > > > > > > <URI>http://dev.mtld.mobi/testimage.gif</URI> > > > > <protocol>HTTP/1.1</protocol> > > > > <header name="user-agent" > > > > value="W3C-mobileOK/DDC-1.0 (see > > http://www.w3.org/2006/07/mobileok-ddc)"/> > > > > <header name="host" value="dev.mtld.mobi"/> > > > > <header name="accept"> > > > > <element name="application/xhtml+xml"/> > > > > <element name="text/html"> > > > > <parameter name="q" value="0.1"/> > > > > </element> > > > > <element > > name="application/vnd.wap.xhtml+xml"> > > > > <parameter name="q" value="0.1"/> > > > > </element> > > > > <element name="text/css"/> > > > > <element name="image/jpeg"/> > > > > <element name="image/gif"/> > > > > </header> > > > > <header name="accept-charset"> > > > > <element name="utf-8"/> > > > > </header> > > > > </HTTPRequest> > > > > <HTTPResponse> > > > > <rawHeaders>Transfer-Encoding: chunked
 > > > > Date: Wed, 04 Jul 2007 09:24:31 GMT
 > > > > Server: Apache/2.2.2 (Unix) DAV/2 mod_ssl/2.2.2 OpenSSL/0.9.8b PHP/5.1.4 > > mod_apreq2-20051231/2.5.7 mod_perl/2.0.2 Perl/v5.8.7
 > > > > Last-Modified: Fri, 15 Jun 2007 17:27:50 GMT
 > > > > ETag: "dcd0-35f-2a9ab180"
 > > > > Accept-Ranges: bytes
 > > > > --------------: ---
 > > > > Cache-Control: max-age=604800
 > > > > Expires: Wed, 11 Jul 2007 09:24:31 GMT
 > > > > Content-Type: image/gif
 > > > > </rawHeaders> > > > > <protocol>HTTP/1.1</protocol> > > > > <status code="200" reason="OK"/> > > > > <header name="--------------" value="---"/> > > > > <header name="expires" value="Wed, 11 Jul 2007 09:24:31 > > GMT"/> > > > > <!-- etc --> > > > > </HTTPResponse> > > > > </retrieval> > > > > <imageInfo> > > > > <validity valid="true"/> > > > > <transparency transparent="false"/> > > > > <actualDimensions height="19" width="19"/> > > > > </imageInfo> > > > > </image> > > > > </images> > > > > > > > > > > > > > > > > In a previous email, Abel has proposed the following structure: > > > > > > > > <images> <!-- this includes objects-which-are-images, and also includes > > stylesheet background images and list images --> > > > > <image> > > > > <reference> > > > > <location ref="#pd1" type="line">20</location><!-- > > where it is referenced - but in this case we probably need an XPATH??? > as > > well??? --> > > > > <URI>image1.jpg</URI> <!-- the URI as quoted in src > > attribute etc --> > > > > </reference> > > > > > > > > > > > > <retrieval id="img1"> > > > > > <retrievedURI>http://w3.org/image1.jpg</retrievedURI> > > > > <HTTPRequest connection="conn1" id="request4"> > > > > <stuff>...</stuff> > > > > </HTTPRequest> > > > > <HTTPResponse/> > > > > <entity size="345" encoding="base64"/> > > > > </retrieval> > > > > <imageInfo> > > > > <validity valid="true"/> > > > > <transparency transparent="false"/> > > > > <actualDimensions width="20" height="20"/> > > > > <!-- CTIC:need to register these dimensions for > each > > equal image--> > > > > <statedDimensions width="20" height="20"/> > > > > <!-- actually need to think about cached images and > > also for retrieval --> > > > > > > > > </imageInfo> > > > > </image> > > > > <!--CTIC:A cached image... --> > > > > <image> > > > > <reference> > > > > <location ref="#pd1" type="line">40</location><!-- > > where it is referenced - but in this case we probably need an XPATH??? > as > > well??? --> > > > > <URI>image1.jpg</URI> <!-- the URI as quoted in src > > attribute etc --> > > > > </reference> > > > > <!--CTIC:If there are two or more occurrences orf the same > > image we can reference the retrieval block.It is supposed a caching > > behaviour --> > > > > <retrieval ref="#img1"/> > > > > <!-- CTIC:Perhaps we can descompose this block?--> > > > > <imageInfo> > > > > <validity valid="true"/> > > > > <transparency transparent="false"/> > > > > <actualDimensions width="20" height="20"/> > > > > <!-- CTIC:need to register these dimensions for > each > > equal image--> > > > > <statedDimensions width="20" height="20"/> > > > > <!-- actually need to think about cached images and also > for > > retrieval --> > > > > </imageInfo> > > > > </image> > > > > > > > > <image/> <!-- etc. --> > > > > </images> > > > > > > > > > > > > Getting the location of a resource reference into MOKI is a little > trickier > > the way things currently stand in the framework. We can use > > DOMUtils.getLineNumber() to get the line number when we identify > resources > > (in HTTPXHTMLResource.extractImages(), but when we build up a list of > the > > resources, they are returned as a list of URIs, with no other > information. > > This URI list is then consumed in the Preprocessor.preprocess method, > and > > HTTPResource objects are created and downloading occurs. > > > > > > > > Changes I have implemented modify this process so that instead of > returning > > a list of URIs, a list of ResourceReference objects is returned. A > > ResourceReference object contains the URI (as mentioned in source), and > also > > contains the line number of the occurrence. > > > > A ResourceCache object is instantiated in the Preprocessor.process() > method, > > and this maintains a list of the CachedResources. We check each > reference > > against this cache, and only create an HTTPresource if it is unique, > thus > > preventing multiple downloads of the same resource. If it is not unique > then > > we append this reference to a list of references of this CachedResource. > > > > Next, we pass the Cache into the PreprocessorResults object with a new > > method setCache(). The main purpose of this is so that we can access the > > reference locations as we build the MOKI document. When we are building > the > > MOKI we can now perform a lookup for any references for each > > HTTPImageResource that we record > > > > > > > > So in MOKI an image resource and all references to it would be recorded > > something like this: > > > > > > > > > > > > <images> > > > > <image> > > > > <reference> > > > > <location ref="#pd1" type="line">1</location> > > > > <URI>image1.jpg</URI> <!-- the URI as quoted in src > > attribute etc --> > > > > </reference> > > > > > > > > <reference> > > > > <location ref="#pd1" type="line">2</location> > > > > <URI>image1.jpg</URI> <!-- the URI as quoted in src > > attribute etc --> > > > > </reference> > > > > > > > > <reference> > > > > <location ref="#pd1" type="line">3</location> > > > > <URI>image1.jpg</URI> <!-- the URI as quoted in src > > attribute etc --> > > > > </reference> > > > > > > > > <reference> > > > > <location ref="#pd1" type="line">4</location> > > > > <URI>image1.jpg</URI> <!-- the URI as quoted in src > > attribute etc --> > > > > </reference> > > > > > > > > > > > > <retrieval id="img1"> > > > > > <retrievedURI>http://w3.org/image1.jpg</retrievedURI> > > > > <HTTPRequest connection="conn1" id="request4"> > > > > <stuff>...</stuff> > > > > </HTTPRequest> > > > > <HTTPResponse/> > > > > <entity size="345" encoding="base64"/> > > > > </retrieval> > > > > <imageInfo> > > > > <validity valid="true"/> > > > > <transparency transparent="false"/> > > > > <actualDimensions width="20" height="20"/> > > > > <!-- CTIC:need to register these dimensions for > each > > equal image--> > > > > <statedDimensions width="20" height="20"/> > > > > <!-- actually need to think about cached images and > > also for retrieval --> > > > > > > > > </imageInfo> > > > > </image> > > > > > > > > Thus we only have one image element per unique image resource, and we > list > > all references to this resource under the same image element. > > > > > > > > Any thoughts before I commit these changes? > > > > Ruadhan
Received on Friday, 6 July 2007 08:45:55 UTC