Suggested WOFF processing model and improved conformance claims

Hello public-webfonts-wg,

Problem
-------

The WOFF spec currently has the following, hard-to-test assertion:

"the overall font checksum of a font decompressed from a conformant WOFF file should always match the checksum in the original, valid sfnt-based font file, except in the case where the original file included unreferenced data between or after the actual tables"

There is also a large, non-normative note with two sections, one on re-ordering and the impact on checksums, the other starting with

"To ensure that lossless round-trip conversion from sfnt to WOFF and back will be possible" followed by three (non-normative) conditions.

Lastly, there is mention of an 'original' font which can mean different things to different people. Its used both in the sense of the sfnt font which a woff generator operates on, and in the sense of  the font which may in practice be subsetted for deployment, and may also be misunderstood as the font designers master source.

Proposed solution
-----------------

A little terminology, and a simple processing model (perhaps illustrated with a diagram) clarifies which parts of the total processing that an actual tool could do, falls within the scope of the woff specification.

It also allows the claim about overall font checksum to be changed to a MUST

It also allows a claim of bit-for-bit round-trip fidelity to be made (which is testable).

I have no attachment to the specific terminology used, probably better names could be discussed. The important point is to clearly label the stages.

The WOFF processing model starts with an sfnt that meets the three conditions currently given in the non-normative note: Font table padding, No "hidden" data, checksums are correct. Lets call such a font the 'input sfnt'.

WOFF creation covers conversion of the input sfnt to WOFF. WOFF recovery covers conversion of a WOFF to the output sfnt. (I'm avoiding the more obvious terms compression and decompression since the compression part if optional, and since 'wofficication' sounds odd.

    input        creation
    sfnt  ---------------------->  
                                    woff
   output        recovery
    sfnt  <--------------------- 

Since the input sfnt meets the conditions needed for lossless recovery, we can assert that input sfnt == output sfnt and test for this.

We can then elaborate this model to add steps which may well happen, but are outside the woff spec itself - subsetting for example, or 'rectification' (correction of padding, removal of hidden data, recalculating checksums, also removal of any dsig if a font is modified (such as by subsetting) to create the input sfnt.

We can also state that a practical tool may well combine these steps, such that the input sfnt may never exist in memory or on disk as a separate item. But the woff spec itself only deals with an input sfnt that meets the conditions listed.

Similarly for the output sfnt, we can state that this may never exist in memory or on disk as a separate item. For example a decoder in practice may not bother to decode tables which it knows it will not use. But the woff spec itself only deals with an output sfnt that is recovered in its entirety.

(visualise another diagram, with a box around the original diagram to show the part covered by the woff spec, and processes of subsetting or rectification added).

This also covers the case where there is no sfnt; an authoring tool goes directly from its internal representation to woff. Fine; the format aspects can be tested from such a direct output, but assuming that it can also output in an sfnt form, the authoring tool aspects can be checked by exporting woff, exporting sfnt, treating the sfnt with is exported as the 'input sfnt', recovering an output sfnt from the woff and comparing them.

Impact on the spec
------------------

- define input sfnt and output sfnt. add the diagram.

- split out the second half of the large note,  starting "To ensure that lossless round-trip conversion from sfnt to WOFF and back will be possible, it is recommended that the original sfnt file should conform to certain norms "

Change that sentence to "To ensure that lossless round-trip conversion from sfnt to WOFF and back will be possible, the input sfnt sfnt file  MUST have the following properties"

Then continue with the three conditions of the original note, restated as properties rather than corrections (correct padding, no hidden data, correct checksums)

- state that subsetting etc happens at a stage before the input sfnt and that the woff spec does not cover this. State that the input and output sfnt may not physically exist in some workflows if stages are combined.

- add the larger diagram

- state the equivalence of input sfnt and outpt sfnt as a testable MUST conformace statement on authoring tools. (Not on user agents, since a UA is required to not expose the recovered, output sfnt).

-- 
 Chris Lilley   Technical Director, Interaction Domain                 
 W3C Graphics Activity Lead, Fonts Activity Lead
 Co-Chair, W3C Hypertext CG
 Member, CSS, WebFonts, SVG Working Groups

Received on Friday, 5 November 2010 07:50:36 UTC