[whatwg] Web Archives

On 4/12/07, Julian Reschke <julian.reschke at gmx.de> wrote:
> Michael A. Puls II schrieb:
> > ...
> > If every browser supports .mht, I still don't think it's the best
> > format for archiving.
> > ...
>
> What exactly is the problem with .mht (RFC2557)? Are they fixable? How
> about trying to gather a group of people interested in fixing it?

If the program generating the .mht file:

1. Generates image attachments (and other non-readable formats) as
"Content-Transfer-Encoding: Binary".  (So they don't take up more file
size than they should.)

2. Generates the html attachment without modifying the markup (except
for paths when necessary / requested). (It shouldn't modify the
doctype on you like it does in IE.)

3. Uses "Content-Transfer-Encoding: 8bit" for text/html instead of
quoted printable or base64.

4. Produces the correct meta data via the mail headers (and encodes
the header values properly in the case of unicode values)

5. Is able to handle the files it generates.

, then .mht isn't so bad. (Opera and IE do in fact display binary
attachments in .mht files and Opera does in fact not mess with your
markup when generating the archive.)

However, the data still isn't compressed and converting to and from
the archive is not as easy as zip.

I also don't think mail headers are the best format for specifiying
meta data. I'd prefer a format that accepts raw utf-8 sequences.

-- 
Michael

Received on Friday, 13 April 2007 01:14:17 UTC