- From: Brian Stell <bstell@google.com>
- Date: Thu, 16 Nov 2017 16:50:34 -0800
- To: "Levantovsky, Vladimir" <Vladimir.Levantovsky@monotype.com>
- Cc: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
- Message-ID: <CAGD0vg_QmQutnzYs+CjxWVHChbvrwD=HnsAiRWSApx=SWw1cXQ@mail.gmail.com>
Hi Vlad, Here are some thoughts (from memory) from my experience streaming CJK fonts, YMMV. *Preprocessing* My preprocessing pulls each font apart making a font base (AKA skeleton) that glyph data can be added back to. To simplify OpenType support I leave those tables in the base. This does not use much space for Noto Sans CJK but might for other fonts. *Download size* Looking at the Chrome network page reports that the font bases for 7 weights of Noto Sans Simplified Chinese (IIRC about 3.5MB per weight as a WOFF2, ~14MB total) takes a little under 80KB compressed total. I do not have any thoughts on how to make this much smaller. My Javascript library takes a little under 80KB compressed. My rough studies say this could shrink to 30-40KB if I spent the time to remove the Closure library. The glyph data for a given page depends on the number of unique characters/weight pairs but 80KB for a cold start is not an unreasonable size. The font base data and glyph data are stored in the supersetted font in IndexedDB for reuse. On subsequent (warm) page loads the new character data needed is often as little as 5-20KB. *Memory* Keeping the font data in Javascript memory is very memory expensive even in a Uint8Array (IIRC 20MB of font data took 50MB of Javascript memory). Because of this I do not keep an in-memory copy. For every injection I fetch the latest version of the font from IndexedDB, inject the new glyph data, fix up tables, store the supersetted data back, and then pass that data to the C++ code. To support multiple pages/tabs doing 'concurrent' updates I lock the DB during the fetch/inject/fix-tables/store operation. *Character vs glyph* Javascript can only see the character changes in the DOM. For many fonts most of the characters map 1-to-1 with the glyphs. However, there are fonts with more complex mappings. My current system precalculates the closures and when a character is requested the server returns all related glyphs (including accents, variants, etc). For Noto Sans CJK this is not an issue. But some fonts have multiple glyphs per character. If the browser reported the needed glyphs that could make this download size much smaller for some fonts; eg, Arabic, Malayalam (IIRC), etc. *Processing* The code to watch the DOM for character changes is fairly small and does not use that much time/CPU. Injecting new glyph (and other) data into the existing font data using an array of DataViews is fast. Passing (transferring) the font data (an array of DataViews) from Javascript to the C++ code takes significant time (dozens to hundreds of milliseconds). A hundred or 2 milliseconds may not see like a long time by itself but when there are 7 weights it really adds up. Because of this I aggressively compact the client copy of the font as much as possible to shorten the transfer time. Getting the text on the page not to blink while the transfer is happening takes a bit of 'trickery'. *Storage* Because I use IndexedDB the data is stored per-domain. For example: https://www.google.com is different from http://www.google.com, is different from https://mail.google.com, etc. This is a theoretical problem. Practically, the size efficiency is so good that there can be many streamed duplicates before reaching any of the subset sizes that I have seen. *Random thoughts on what my system could benefit from* Obviously doing everything on the C++ side would be the optimal solution. Just keeping the font data in the C++ side would solve the transfer time. Perhaps the data could be behind an opaque Javascript handle that acts like a ArrayBuffer/DataView. There would need to be some storage mechanism/API. Keeping the data on the C++ side would (hopefully) solve the transfer-time / memory-size and I would not have to compact the supersetted font. The uncompacted font would have all the space needed for the missing glyph data. This would make injecting glyph data simply an over-write operation (the CMAP is slightly trickier but do-able). To allow multiple pages/tabs to 'concurrently' update the data it would be critical to have a lock to serialize the data updates. Perhaps there could be a copy operation that 'locks' the font until it is updated. Regardless of where the data is, a routine to validate the data would be really helpful to detect problems. I think that if there was a reference implementation of the Javascript that would help the industry adopt and validate this. A different approach would be for the browser to internally store the font in a more regular format (like an indexed tables). The Javascript could supply the core info. When the browser needed more data it could call back to the Javascript. The Javascript could then take responsibility for supplying requested data without requiring a new HTTP protocol. That's all I can think of off the top of my head at the moment. Brian
Received on Friday, 17 November 2017 00:51:04 UTC