TrueType Collections

Hi WG folks,

   One of the remaining technical questions to consider is whether to add
support for functionality equivalent to TrueType collections to the format.
In reviewing the pros and cons, I think there's a pretty strong case for
_not_ including TTC's, and I think it would be useful to set down my
thoughts.

   First, on the pro side, I wanted to analyze the use cases. The main
engineering question is how much file size saving is possible from serving
multiple fonts with some shared tables. I've heard two use cases that are
compelling (it is of course possible I'm missing more). The first is
multiple styles of a complex script font, with all styles sharing a GSUB
table, and the second is Han unification in CJK.

   A font family has to be carefully designed for all styles to have the
same GSUB. In particular, glyph numbering has to be consistent across the
styles (of course, this means that cmap can be shared as well). I believe
that in general it doesn't make sense for multiple styles to share GPOS,
as, in high quality designs, mark positioning will be adjusted for the
weights. I looked at a bunch of complex script fonts and found that only in
Noto Sans Devanagari was the relative size of the GSUB table significant
(it is about 34k out of 125k). However, in the existing design, the regular
and bold weights are not glyph-compatible - the font would need to be
reengineered to take advantage of such an optimization. In other indic
scripts I looked at, the GSUB size is less (Noto Sans Kannada is 5k out of
78k), and in other complex scripts _much_ less (Droid Sans Naskh is 2k out
of 89k, and Thai is 294 bytes out of 21k).

   The other use case is packaging CJK fonts specialized to different
locales (simplified Chinese, traditional Chinese, and Japanese) in the same
font file. Two observations here: in Web use, it is unusual to require
multiple CJK appearances for the same font in the same web page. Exceptions
do exist, for example dictionaries. Second, the OpenType variant mechanism
is a more modern approach to the same problem. In addition, using OpenType
variants is much easier for compatibility - if a browser doesn't support
them, you still see completely valid CJK.

   So my conclusion is that there are valid use cases but that they are not
compelling - in practice, you'd only see significant savings for a tiny
fraction of web pages.

   On the "con" side I was concerned about spec complexity and security
implications. A more minor concern was format compatibility (we have
prototype implementations). It would be nice to not break compatibility,
but that said, if there were a real advantage to changing the format, it
would be worthwhile.

   The existing draft basically treats the compressed data as a stream. In
a minimal memory footprint environment, it would allow for decompressing a
font file in a stream-based, incremental fashion, for the most part. The
exception is filling in the checksum values, which would require going back
and modifying the header after all tables are processed. However, for many
applications the checksums can be considered optional.

   (One point that I observed while digging into this, not directly
relevant to the TTC question but perhaps interesting, is that to enable
minimal memory footprint streaming, we'd have to enforce that the loca
table follows glyf. This seems reasonable enough to me that I believe I
will add it as a requirement for compressors in the spec)

   A compressed file, by contrast, wouldn't be represented by a sequence of
tables, each with a size (as is the present format). Rather, the most
natural representation would be (offset, length) pair references to the
uncompressed block. A straightforward implementation would just decompress
the entire block, then extract tables using these references. Of course,
most actual files would reduce to the streamable case, but having a
separate code path to analyze that and use more efficient processing sounds
(a) more complex, and (b) risky in terms of opening more potential security
problems. Already, OpenType Sanitizer does extensive checking to validate
that tables don't overlap, etc. Such checking is not necessary in the
stream case (though of course sizes still need to be validated).

   Thus, my conclusion is that the costs in terms of complexity and
potential security risk are nontrivial. Thus, I believe we should not try
to standardize a method for font file collections with shared tables as
part of WOFF.

   Very happy to hear discussion, especially if I've missed something.

Raph

Received on Monday, 10 February 2014 18:38:28 UTC