Re: WOFF-ACTION-176: Test hmtx transformation over google fonts corpus (how many lsb == x-min for all glyfs, what savings)

On 21/5/15 11:01, Cosimo Lupo wrote:
> Hello,
>
> regarding Vladimir’s suggestion on hmtx transformation:
>
>> we can easily determine this case by analyzing the table directory
>> entry for 'hmtx' - comparing the actual decompressed table size
>> against the 'original' size
>
> I believe there’s a problem. How could we extract a transformed hmtx
> table data from the decompressed font data stream, if the htmx entry
> in the table directory does not have a “transformLength” field
> (currently reserved for glyf and loca only), but only an “origLength”
> value?
>
> If the hmtx was not transformed, then the length of the original
> table would also be the length of the decompressed table. But if the
> hmtx was indeed transformed, then origLength would be greater than
> the decompressed size of the transformed table data, and we would
> risk reading off data from another table that follows hmtx in the
> decompressed stream...
>
> On the encoder’s side, how can we store a transformed hmtx in the
> WOFF2 font data stream, but only store the original length in the
> table directory, when there are no offsets to individual tables, nor
> any flag that tells the decoder to look for an optional
> transformLength?
>
> I can’t see how to make this work this without modifying the
> specification, to explicitly signal that hmtx was indeed transformed
> — thus also breaking compatibility with the current WOFF2
> implementations.

I think you're right that Vlad's suggestion doesn't work as written, as 
we don't have any way to determine the "actual decompressed table size" 
to compare with the "original" size recorded in the directory.

There's a possible way to overcome this, I think, but it's sufficiently 
convoluted that I am reluctant to even suggest it.... but here goes. For 
the 'hmtx' table, the 'origLength' field in the directory will actually 
store the length of the compressed data *after* the transformation (if 
applicable), so that we can correctly read it from the data stream. So 
'hmtx' will in effect store transformLength *instead of* origLength, 
rather than in addition to it.

We could -- in theory -- do this because the true origLength of 'hmtx' 
can be derived from other data in the font: it will be exactly

   trueHmtxLength =
     hhea.numOfHMetrics * sizeof(longHorMetric) +
       (maxp.numGlyphs - hhea.numOfHMetrics) * sizeof(SHORT)

So the two possible values for the 'hmtx' origLength field in the WOFF2 
table directory would be either this trueHmtxLength value, in which case 
the lsb-removal transform was not applied, or

   transformedHmtxLength = hhea.numOfHMetrics * sizeof(USHORT)

in which case lsb-removal was used, and the decoder needs to reconstruct 
lsb values from the glyph bbox records.

All existing WOFF2 fonts would continue to work after decoders are 
updated to recognize this convention, because they have the 
trueHmtxLength stored in the origLength field, meaning the transform has 
not been applied.

However, my preference would be to avoid this sort of complex 
interdependency, and instead bring back an 'isTransformed' flag in the 
TableDirectoryEntry flags byte. This is so much simpler and cleaner -- 
and more extensible, if we come up with a transform for any other table 
whose "true original size" may not be predictable from other data in the 
font.

Ideally, we'd define an isTransformed flag, and require it to be set for 
all tables that are transformed (including 'glyf' and 'loca'); this 
would imply that font producers would have the option of NOT 
transforming 'glyf' and 'loca' if desired. But existing fonts have these 
tables transformed, and do not have the flag set; to avoid breaking 
compatibility with them, perhaps we need to specify that those two 
tables are ALWAYS transformed, and the flag is ignored.

For tables other than 'hmtx' (and 'vmtx', which can be handled 
similarly), setting the isTransformed flag would currently be an error, 
as no transform is defined for them and the decoder wouldn't know what 
to do.

JK

Received on Thursday, 21 May 2015 10:34:01 UTC