W3C home > Mailing lists > Public > whatwg@whatwg.org > April 2007

[whatwg] Web Archives

From: Michael A. Puls II <shadow2531@gmail.com>
Date: Wed, 11 Apr 2007 18:17:55 -0400
Message-ID: <6b9c91b20704111517r5098bdf4o9abf88aad10eaf91@mail.gmail.com>
On 4/11/07, Tyler Keating <tylerkeating at mac.com> wrote:
> Hi,
> I apologize if I've missed this in the specification or mailing
> archives, but I have a suggestion related to standardizing web
> "archives" in HTML5.  Currently, I know that Firefox uses Mozilla
> Archive Format (.maf), Internet Explorer and Opera use MIME HTML
> (.mht)  and Safari uses its own format (.webarchive) for saving a web
> page and all of its resources into a single file.  So clearly a
> standard would be beneficial in ensuring "archive" compatibility
> between browsers and I think it's suitable for that standard to
> reside in HTML5.

There's also the case of creating an .html file where all the
resources are specified as data URIs.

It's a really good way to archive, but IE won't handle it and most
plug-ins don't accept data URIs, so there are problems with that
use-case. (unless browsers can help with that in a secure way.)

I made a suggestion about this on the Opera forums a while ago when
Opera didn't even support .mht.
<http://my.opera.com/community/forums/topic.dml?id=72718>
(The actual working example links are broken, but the idea was..)

In short, you have an index.ext along with all the files it needs. You
(or the browser if you're saving the page) zip them up and change the
extension to file.owp (was OperaWebPage archive at the time).

The browser would read the zip file, extract it to a temp directory
(or in memory or to the browser's cache etc.) and load the index file.

The idea is really simple and this way, all the files stay in tact
(unlike .mht which changes the markup).  However, the Mozilla Archive
format already does this. It just uses index.rdf to specify what page
to load instead of looking for index.ext.

Not sure if HTML5 is the spot for this, but either way, it'd be neat
to have a standard method of putting files in an archive where  the
files are kept separate and unmodified. (I might want to create a
HTML-based (with multiple web pages and pics etc.) FAQ archive, for
example.)

-- 
Michael
Received on Wednesday, 11 April 2007 15:17:55 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:58:54 UTC