RE: Comment on WOFF file format 'origCheckSum' value from Levantovsky, Vladimir on 2010-04-29 (public-webfonts-wg@w3.org from April 2010)

From: Levantovsky, Vladimir <Vladimir.Levantovsky@MonotypeImaging.com>
Date: Thu, 29 Apr 2010 12:23:59 -0400
To: Jonathan Kew <jfkthame@googlemail.com>
CC: "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
Message-ID: <7534F85A589E654EB1E44E5CFDC19E3D0203BD4C7C@wob-email-01.agfamonotype.org>
Hi Jonathan,

Thank you for your quick response.

VL> > Currently, the WOFF table directory contains 'origCheckSum' field [1]
VL> > that, as I understand it, is simply a duplication of the
VL> > 'checkSumAdjustment' value from the original SFNT file's 'head' table.
> 
JK> No, this is the *per-table* checksum from the SFNT file's table
JK> directory. This is separate from the overall font checksum that is
JK> stored as checkSumAdjustment in the 'head' table.

Yes, thank you. I should have been more careful scanning the description of the table entries.

VL> > While I understand the reason why this field was made part of the
VL> > WOFF table directory it, in my opinion, does little to protect original
VL> > font data (the original checksum and the font data integrity can be
VL> > evaluated just by running checksum check after decompressing font
VL> > tables), and it does nothing to protect the WOFF file data itself (it
VL> > doesn't cover data entries that are part of the WOFF Header, Extended
VL> > Metadata, Private Data and Table Directory itself).

JK> It is not really present to provide "protection", merely to ensure that
JK> the UA can reconstruct the original SFNT data (or portions of it),
JK> including a valid table directory, without being required to
JK> recalculate checksums in order to do this. The design of WOFF was
JK> intended to allow simple reconstruction of the original SFNT data
JK> without imposing a requirement that the UA do specific validation,
JK> except checking that the WOFF structure itself is valid. Of course, if
JK> UAs wish to do further validation of the font data itself, they are
JK> free to do so -- this is equally applicable for data that arrived via
JK> WOFF or for raw SFNT data.

Okay, thank you for clarifying this. I think it may be a useful to also include similar clarification in the WOFF spec, to define what UAs are expected to do (or not to do). It would also help us down the road when we develop a test suite.

VL> > I suggest that the scope of this field should be extended (and if we
VL> > agree, the field itself may be moved to a different part of the file,
VL> > e.g. Header). I believe it would be useful to make this field cover
VL> > both the original font data and the WOFF data, so that when a value is
VL> > calculated - the  WOFF data is taken into account and, upon running a
VL> > checksum check, the value of 'origCheckSum' would be produced when the
VL> > value encoded in the WOFF file is summed up with the rest of the WOFF
VL> > data. This way, we would allow user agents to verify that both the WOFF
VL> > data and the original SFNT data are intact.

JK> It would be possible to add some kind of checksum for the overall file,
JK> if this is seen as important. However, I would be reluctant to
JK> *require* that this should be checked by the UA. For one thing, that
JK> would make it impossible to retrieve selected portions of the font
JK> (because the entire file would be needed for checksumming). Currently,
JK> the format is designed to allow a client, if desired, to read
JK> individual SFNT tables (or the WOFF metadata) without requiring the
JK> entire file to be downloaded. This could be advantageous in the case of
JK> large font files, where the UA could examine specific tables in order
JK> to decide whether to download and use the rest of the font.

I believe that it would be a good idea to add the checksum for WOFF data, similar to how it's done in OpenType / OFF font files, and we can do it in a way that would not *require* significant efforts by the UA. What I proposed earlier wouldn't require to have entire file to calculate checksum. We can add a woffCheckSum to the WOFF header to cover the WOFF data, and calculate it so that when the check is run including the woffCheckSum itself - the result should be equal to the 'checkSumAdjustment' of the original font file. This value would *cover* the original font file but would not require to uncompress everything to run a check. UA would only need to run a check on the WOFF data and compare the result with the value from the uncompressed 'head' table (which is one that's likely to either be included in its original form or would have to be uncompressed anyway).

To compute the woffCheckSum we would need to set it to 0, sum the WOFF data and store the difference between checkSumAdjustment of the original font file and the calculated sum. UA would then do the same (running check on WOFF data only) and compare the result with the value from the head table.

JK> Whether to reject font files where checksums don't match is a question
JK> we can discuss. Making this a requirement means that UAs will be
JK> required to do checksum calculation/verification, which imposes a
JK> (small) additional processing burden that is otherwise optional.
JK> 
JK> Note that as far as I am aware, most platform font APIs do NOT
JK> currently validate font table checksums or the overall SFNT checksum,
JK> and do NOT reject fonts with incorrect checksums. At one point, we
JK> included checksum validation in the sanity-checking that Gecko does
JK> with downloaded font data, prior to attempting to activate the font,
JK> but this led to user complaints because certain fonts failed to work in
JK> Firefox, even though they appeared to work fine elsewhere.

I agree this is something we should discuss. While checksum check isn't likely to stop someone who is trying to exploit font data download as a security hole, it would seem to make sense to assume that valid font file should have proper checksum values, and a mismatch would indicate a corrupted data.

Thank you,
Vladimir
Received on Thursday, 29 April 2010 16:23:35 UTC