W3C home > Mailing lists > Public > www-font@w3.org > July to September 2009

compressed fonts: a proposed format

From: Jonathan Kew <jonathan@jfkew.plus.com>
Date: Wed, 1 Jul 2009 12:31:59 +0100
Message-Id: <685D3130-23D9-4F32-859D-AEC26F21DBC9@jfkew.plus.com>
To: www-font@w3.org
Here is a suggested description of "ZOT", a compressed font format  
intended for Web-font use. This is designed to be "lightweight" in the  
sense that it will be simple to implement, both at the producer and  
consumer sides, and add minimal extra code to browsers; this is  
achieved primarily by using an existing compression library rather  
than a custom algorithm.

The compression achieved with this format will depend exactly which  
compression library is chosen (this is still to be decided); it may  
not be as good as can be achieved by a font-optimized approach like  
MTX, but I believe this approach has other advantages that may make it  
a better overall choice.

IMO, it should be simple for browser vendors to support linked fonts  
using this format; if they need a "complete" font to hand to an  
existing text API, the original OpenType font can easily be  
reconstructed in memory, and if they want to access font tables  
individually from the compressed file, this is very similar to doing  
so with a standard OpenType font.

I'm happy to discuss details of this format, but please, let's keep  
that discussion separate from the question of whether using a font- 
specific compression method (independent of whether we ultimately  
choose something like MTX or something more like ZOT) is the way  
forward for interoperable web fonts. If we can agree that the overall  
approach is acceptable, then potential implementers can discuss the  
technical details of formats, separate from the social, legal and  
political aspects.

Jonathan


Compressed OpenType for web use: the ZOT format
===============================================

Jonathan Kew, Mozilla Corp.
July 1, 2009


This is a proposal for a simple compressed format for fonts, designed primarily for use on the web. A feature of this format is that clients can decompress individual tables as needed, and thus can access specific parts of the font data without the need to decompress the entire font. In resource-constrained environments such as mobile devices, this allows the user agent to examine, for example, the OS/2 and cmap tables quite cheaply.

It would even be possible for the UA to use byte-range requests to avoid even downloading the entire compressed file, by first downloading the header, to determine the number of tables, and then the table directory. At this point, the UA can issue a request to get the compressed version of any individual font table, and thus it can minimize the RAM footprint of pages that may link to numerous fonts.

This format does not provide for individual decompression of single glyphs; it is expected that sites using linked fonts for CJK languages, or using large-character-inventory fonts like Arial Unicode, Code2000, Lucida Grande, etc. (subject to proper licensing), will subset the fonts appropriately before applying ZOT compression.

The extension ".zot" is suggested for files using this format.


Overall file structure
======================

All values in the ZOT file are stored in big-endian format, like TrueType and OpenType.

The ZOT file consists of a 16-byte header, followed by a variable-size table directory and then a number of tables:

ZOTFile:
  ZOTHeader
  TableDirectory[]
  Tables[]

The main body of the file consists of the same collection of tables as the original uncompressed OpenType font; each table is separately compressed (see below), and the OpenType table directory is replaced by the ZOT table directory.


Header
======

The header contains an identifying signature, a version number, the total file size in bytes, and the number of entries in the following table directory:

ZOTHeader:

  UInt32    signature     0x7A4F5446 ('zOTF')
  UInt32    flavor        The "sfnt version" of the original file
                          (i.e., 0x00010000 or 'OTTO')
  UInt32    length        Total size of the ZOT file
  UInt32    numTables     Number of entries in the table directory


Table directory
===============

Each entry in the table directory is 20 bytes. The entries must be sorted in ascending order by tag, to permit binary search.

TableDirEntry:

  UInt32    tag           4-byte table identifier (see OT spec)
  UInt32    offset        Offset to the data, from beginning of ZOT file
  UInt32    compLength    Length of the compressed data (see note below)
  UInt32    origLength    Length of the uncompressed table (from OT table dir)
  UInt32    origChecksum  Checksum of the uncompressed table (see OT spec)

NOTE: It is possible that some tables, particularly small ones, may not be worth compressing. It is permissible to store the original table, uncompressed, in the ZOT file. This is indicated by compLength >= origLength in the table directory. Therefore, a tool creating ZOT fonts MUST check that the compressed data for each table is smaller than the original uncompressed data. If this is not the case, it MUST store the table in uncompressed form. (NB: it is possible for compLength to be larger than origLength in this case, because the uncompressed table may have up to 3 bytes of padding at the end which are not included in the origLength count.)


Tables
======

The tables in the ZOT file are exactly the same as the tables in the original OT file, except that each table may have been compressed by a standard data compression algorithm (TO BE DECIDED). Tables may be stored in any order. There is no requirement for any specific alignment of tables or padding between tables, except that when a table is stored uncompressed (see above), it MUST begin on a 4-byte boundary and be padded with zeros if necessary so that its length is a multiple of 4. The "compLength" field in the table directory will record the complete (padded) length of the table data, and may therefore be up to 3 bytes larger than the "origLength".


TO DO
=====

This description does not yet address how to compress TTC (TrueType Collection) files. It should be straightforward to extend the ideas to this format as well, but I have not thought through the details yet.

The algorithm to use for compressing each table will be chosen from among those used in standard tools such as gzip, bzip2, or lzma; this can be discussed among potential implementers. Personally, I don't think achieving the absolute smallest file size is the most critical factor here. Any of these would give a highly worthwhile level of compression for font data on the web. My preference would be to specify zlib (or to be more precise, the use of zlib's compression format) on the grounds that it is such a well-established, stable, trusted format with an easy-to-use, free implementation that (AFAIK) is usable in both free and proprietary products.

I know that lzma would compress better, but I don't think it has the same level of maturity, nor am I clear about the ease of use (for implementers) or the licensing (of an easy-to-use wrapper; I know the "lzma sdk" has been placed in the public domain). My guess is also that most, if not all, potential clients already use zlib, so adopting this for the internal font compression would have minimal code-size impact; I don't think this is true of lzma.

Received on Wednesday, 1 July 2009 11:32:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 11 June 2011 00:14:02 GMT