W3C home > Mailing lists > Public > whatwg@whatwg.org > May 2008

[whatwg] Web Archives

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 13 May 2008 09:55:53 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0805130938250.22257@hixie.dreamhostps.com>
On Wed, 11 Apr 2007, Tyler Keating wrote:
> I apologize if I've missed this in the specification or mailing 
> archives, but I have a suggestion related to standardizing web 
> "archives" in HTML5. Currently, I know that Firefox uses Mozilla Archive 
> Format (.maf), Internet Explorer and Opera use MIME HTML (.mht)  and 
> Safari uses its own format (.webarchive) for saving a web page and all 
> of its resources into a single file.  So clearly a standard would be 
> beneficial in ensuring "archive" compatibility between browsers and I 
> think it's suitable for that standard to reside in HTML5.
> I don't believe this would be very difficult to standardize and the 
> solution may be nothing more than a collection of random files wrapped 
> into a ZIP compressed archive with a unique extension similar to a JAR 
> or ODF file.  The unique extension would be recognized by browsers, 
> email clients and editors, which could then extract and display the root 
> file directly (ex. index.html). The root file would obviously contain 
> relative URIs to any other HTML, JavaScript, CSS, images and other files 
> in the archive so the internal structure may not be important and the 
> browser would not need any new rules to interpret individual files once 
> it has uncompressed the archive into memory. This would facilitate 
> passing HTML based documents around that could be viewed with any 
> browser, yet appear as a small single file.

There are some specifications for this kind of thing already, e.g. 
multipart/related (RFC2387), and the derivative MHTML (RFC2557).

In HTML5, this can be somewhat achieved using the offline application 
cache feature, with a cache manifest. But the right solution to address 
the problems with MHTML are to develop a new RFC that addresses the 
problems with MHTML, IMHO.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 13 May 2008 02:55:53 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:02 UTC