- From: Ruadhan O'Donoghue <rodonoghue@mtld.mobi>
- Date: Wed, 4 Jul 2007 12:31:49 +0100
- To: <public-mobileok-checker@w3.org>
- Message-ID: <C8FFD98530207F40BD8D2CAD608B50B448333E@mtldsvr01.DotMobi.local>
Hi all, A quick progress update on the Caching of resources ACTION 517. Apologies for the delay on this, I've been kept very busy with ready.mobi. My findings are that it is relatively simple to implement a basic caching behaviour - but to get useful information into MOKI is a little trickier. I wanted to run something by the group before committing it. Currently in MOKI, images (to take one kind of resource) are recorded as follows: <images> <image> <URI>testimage.gif</URI> <retrieval> <retrievedURI>http://dev.mtld.mobi/testimage.gif</retrievedURI> <HTTPRequest> <rawHeaders>User-Agent: W3C-mobileOK/DDC-1.0 (see http://www.w3.org/2006/07/mobileok-ddc)
 Accept: application/xhtml+xml,text/html;q=0.1,application/vnd.wap.xhtml+xml;q=0. 1,text/css,image/jpeg,image/gif
 Accept-Charset: UTF-8
 Host: dev.mtld.mobi
 </rawHeaders> <method>GET</method> <URI>http://dev.mtld.mobi/testimage.gif</URI> <protocol>HTTP/1.1</protocol> <header name="user-agent" value="W3C-mobileOK/DDC-1.0 (see http://www.w3.org/2006/07/mobileok-ddc)"/> <header name="host" value="dev.mtld.mobi"/> <header name="accept"> <element name="application/xhtml+xml"/> <element name="text/html"> <parameter name="q" value="0.1"/> </element> <element name="application/vnd.wap.xhtml+xml"> <parameter name="q" value="0.1"/> </element> <element name="text/css"/> <element name="image/jpeg"/> <element name="image/gif"/> </header> <header name="accept-charset"> <element name="utf-8"/> </header> </HTTPRequest> <HTTPResponse> <rawHeaders>Transfer-Encoding: chunked
 Date: Wed, 04 Jul 2007 09:24:31 GMT
 Server: Apache/2.2.2 (Unix) DAV/2 mod_ssl/2.2.2 OpenSSL/0.9.8b PHP/5.1.4 mod_apreq2-20051231/2.5.7 mod_perl/2.0.2 Perl/v5.8.7
 Last-Modified: Fri, 15 Jun 2007 17:27:50 GMT
 ETag: "dcd0-35f-2a9ab180"
 Accept-Ranges: bytes
 --------------: ---
 Cache-Control: max-age=604800
 Expires: Wed, 11 Jul 2007 09:24:31 GMT
 Content-Type: image/gif
 </rawHeaders> <protocol>HTTP/1.1</protocol> <status code="200" reason="OK"/> <header name="--------------" value="---"/> <header name="expires" value="Wed, 11 Jul 2007 09:24:31 GMT"/> <!-- etc --> </HTTPResponse> </retrieval> <imageInfo> <validity valid="true"/> <transparency transparent="false"/> <actualDimensions height="19" width="19"/> </imageInfo> </image> </images> In a previous email, Abel has proposed the following structure: <images> <!-- this includes objects-which-are-images, and also includes stylesheet background images and list images --> <image> <reference> <location ref="#pd1" type="line">20</location><!-- where it is referenced - but in this case we probably need an XPATH??? as well??? --> <URI>image1.jpg</URI> <!-- the URI as quoted in src attribute etc --> </reference> <retrieval id="img1"> <retrievedURI>http://w3.org/image1.jpg</retrievedURI> <HTTPRequest connection="conn1" id="request4"> <stuff>...</stuff> </HTTPRequest> <HTTPResponse/> <entity size="345" encoding="base64"/> </retrieval> <imageInfo> <validity valid="true"/> <transparency transparent="false"/> <actualDimensions width="20" height="20"/> <!-- CTIC:need to register these dimensions for each equal image--> <statedDimensions width="20" height="20"/> <!-- actually need to think about cached images and also for retrieval --> </imageInfo> </image> <!--CTIC:A cached image... --> <image> <reference> <location ref="#pd1" type="line">40</location><!-- where it is referenced - but in this case we probably need an XPATH??? as well??? --> <URI>image1.jpg</URI> <!-- the URI as quoted in src attribute etc --> </reference> <!--CTIC:If there are two or more occurrences orf the same image we can reference the retrieval block.It is supposed a caching behaviour --> <retrieval ref="#img1"/> <!-- CTIC:Perhaps we can descompose this block?--> <imageInfo> <validity valid="true"/> <transparency transparent="false"/> <actualDimensions width="20" height="20"/> <!-- CTIC:need to register these dimensions for each equal image--> <statedDimensions width="20" height="20"/> <!-- actually need to think about cached images and also for retrieval --> </imageInfo> </image> <image/> <!-- etc. --> </images> Getting the location of a resource reference into MOKI is a little trickier the way things currently stand in the framework. We can use DOMUtils.getLineNumber() to get the line number when we identify resources (in HTTPXHTMLResource.extractImages(), but when we build up a list of the resources, they are returned as a list of URIs, with no other information. This URI list is then consumed in the Preprocessor.preprocess method, and HTTPResource objects are created and downloading occurs. Changes I have implemented modify this process so that instead of returning a list of URIs, a list of ResourceReference objects is returned. A ResourceReference object contains the URI (as mentioned in source), and also contains the line number of the occurrence. A ResourceCache object is instantiated in the Preprocessor.process() method, and this maintains a list of the CachedResources. We check each reference against this cache, and only create an HTTPresource if it is unique, thus preventing multiple downloads of the same resource. If it is not unique then we append this reference to a list of references of this CachedResource. Next, we pass the Cache into the PreprocessorResults object with a new method setCache(). The main purpose of this is so that we can access the reference locations as we build the MOKI document. When we are building the MOKI we can now perform a lookup for any references for each HTTPImageResource that we record So in MOKI an image resource and all references to it would be recorded something like this: <images> <image> <reference> <location ref="#pd1" type="line">1</location> <URI>image1.jpg</URI> <!-- the URI as quoted in src attribute etc --> </reference> <reference> <location ref="#pd1" type="line">2</location> <URI>image1.jpg</URI> <!-- the URI as quoted in src attribute etc --> </reference> <reference> <location ref="#pd1" type="line">3</location> <URI>image1.jpg</URI> <!-- the URI as quoted in src attribute etc --> </reference> <reference> <location ref="#pd1" type="line">4</location> <URI>image1.jpg</URI> <!-- the URI as quoted in src attribute etc --> </reference> <retrieval id="img1"> <retrievedURI>http://w3.org/image1.jpg</retrievedURI> <HTTPRequest connection="conn1" id="request4"> <stuff>...</stuff> </HTTPRequest> <HTTPResponse/> <entity size="345" encoding="base64"/> </retrieval> <imageInfo> <validity valid="true"/> <transparency transparent="false"/> <actualDimensions width="20" height="20"/> <!-- CTIC:need to register these dimensions for each equal image--> <statedDimensions width="20" height="20"/> <!-- actually need to think about cached images and also for retrieval --> </imageInfo> </image> Thus we only have one image element per unique image resource, and we list all references to this resource under the same image element. Any thoughts before I commit these changes? Ruadhan
Received on Wednesday, 4 July 2007 11:33:25 UTC