[csswg-drafts] [css-font-loading-3] Expose unicode shaping and glyph vector data functions existing in FontFace (#11538) from 地図の神様 via GitHub on 2025-01-20 (public-css-archive@w3.org from January 2025)

From: 地図の神様 via GitHub <sysbot+gh@w3.org>
Date: Mon, 20 Jan 2025 08:34:18 +0000
To: public-css-archive@w3.org
Message-ID: <issues.opened-2798570253-1737362056-sysbot+gh@w3.org>
CraigglesO has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-font-loading-3] Expose unicode shaping and glyph vector data functions existing in FontFace ==
### What problem are you trying to solve?

Expand [FontFace](https://developer.mozilla.org/en-US/docs/Web/API/FontFace/FontFace) ([spec here](https://drafts.csswg.org/css-font-loading/#fontface-interface)) to expose the glyph vector data and unicode shaping.

I have been exploring this issue for years and I think the best solution is allowing the browser to solve the problem for the programmer. This is an interesting problem because it's such a complex issue that the discussion on the final API will probably be hard, but simultaneously I believe it's simple enough that it's not hard to expose already existing code to add this (a low LOC PR).

The reason I think this is more valuable then ever is due to the [Loop & Blinn patent](https://patents.google.com/patent/US7564459B2/en) which allows you to render raw glyph vector data at any resolution error free is ending in roughly a year, meaning the ability for the average programmer to render every single language in WebGL(2) and WebGPU can be reduced from a still unsolved problem to a simple task if this feature is added.

For reference here are examples of how complex unicode shaping can be:

བོ ད ་རང་སྐྱོང་ལྗོངས།
ព្រះរាជាណាចក្រកម្ពុជា
ދިވެހިރާއްޖެ
କମ୍ବୋଡ଼ିଆ
រាជធានីភ្នំពេញ
ශ්‍රී
င်္က္က

The last one is a single character block with 25 glyphs combined.

### What solutions exist today?

There are three core concepts that are individually solved, `Font Parsing`, `Glyph rendering`, and `unicode shaping`.

### Font Parsing

To parse the font, it's probably best to use [opentype.js](https://github.com/opentypejs/opentype.js). While limited in how many font types it can parse, it's fairly accurate and useful. Comes at a very large size cost, but is a good last resort. Alternatively, you can convert a font to say SDF blocks. Their is a size cost where you only download in batches, but it reduces complexity.

### Glyph Rendering

As for rendering there are various solutions with tradeoffs. For instance, [tiny-sdf](https://github.com/mapbox/tiny-sdf) creates basic SDF glyphs but doesn't work well with shaping, kerning, etc. One solution to combat the size of SDFs needed to maintain sharp edges and error rates was [MSDFs](https://github.com/Chlumsky/msdfgen), again with the same tradeoffs. Lastly, an older technique is bitmap fonts, but they don't scale well. There is a great resource discussing the main solutions [here](https://css-tricks.com/techniques-for-rendering-text-with-webgl/).

### unicode shaping

In terms of unicode shaping there are two major codebases I know of: [International Components for Unicode (ICU)](http://site.icu-project.org/) and [harfbuzz](https://github.com/harfbuzz/harfbuzz).
Mapbox created a module from the ICU, [rtl-text](https://github.com/mapbox/mapbox-gl-rtl-text) which fixes bidirectional text as well as arabic shaping. This is where it ends. The module size is [148 kB](https://bundlejs.com/?q=%40mapbox%2Fmapbox-gl-rtl-text&treeshake=%5B*+as+rtl%5D) with so little to offer.

Painstakingly [I've gotten a lot further](https://github.com/Open-S2/unicode-shaper-rust?tab=readme-ov-file) for less cost ([30.2 kB](https://bundlejs.com/?q=unicode-shaper-rust&treeshake=%5B*+as+WASM%5D)) and yet it's still miles away from the same value as harfbuzz.

There are insanely complex shaping rules that I'll probably never completely be able to copy over to a reasonable WASM build size. the latest [rustybuzz](https://github.com/harfbuzz/rustybuzz/tree/main) was written as `no_std` so that it can have a WASM build but the result is still greater than 1MB in size including font-parsing.

### Full Project Examples

Most projects that render text just don't support much outside simpler western languages. Examples where language support is non-existant are:
- Figma
- ThreeJS (recommends using the DOM or bitmap fonts).

[mapbox-gl-js](https://github.com/mapbox/mapbox-gl-js) & [maplibre-gl-js](https://github.com/maplibre/maplibre-gl-js) are both mapping engines that have limited language parsing. They both used a blend of tiny-sdf and SDF blocks. Outside arabic shaping (as an external addon) there is no other shaping support.

PDF.js has [extremely limited language shaping](https://github.com/mozilla/pdf.js/blob/master/src/core/bidi.js).

### My Takeaway

All these tools already exist in the browser and are arguable "feature complete". I doubt I'll see any good pure JS or even WASM solutions for variable size fonts in the foreseeable future as well. Shaping will probably always be an all or nothing issue where you pay the huge size cost of say `rustybuzz` or add it as an extension. I think the best way to move forward is to take advantage of the tools already installed on every browser.

### How would you solve it?

```ts
interface FontFace {
    // ...pre-existing constructor, constants, and functions
    shapeText(input: string): number[] | undefined;
    getGlyph(id: number, size: number): number[] | undefined;
}
```

**shapeText** will parse the string and return glyph IDs you would need. It's important to specify its the font glyph ID and not unicode ID. The reason is some glyphs are combinations of other glyphs and thus **do not have their own unicode value**. Return undefined if missing glyphs or failed to shape.
**getGlyph** takes the id and font-size and returns the vector data re-computed to the font-size rather then its internal extent. A renderer might usually want to rescale the vector data to a 0-1 x-y. Return undefined if the glyph ID does not exist. It would probably be better to return some kind of Glyph object here? Just giving a base case.

If you look at Chrome's codebase for [FontFace](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/css/font_face.cc;bpv=0;bpt=1), It wraps around `CSSFontFace` which already contains methods to handle shaping and glyph access. I assume it's very similar in how Firefox and Safari work.

With The Loop and Blinn patent ending next year, this raw data can be used almost immediately (just setup a vertex buffer with the glyph data). There is a great explanation on how to take advantage of the vector data in a medium article entitled [Easy Scalable Text Rendering on the GPU](https://medium.com/@evanwallace/easy-scalable-text-rendering-on-the-gpu-c3f4d782c5ac).


### Anything else?

If there are ways to help clarify or come up with more information please let me know.

Resources:

- https://astiopin.github.io/2019/01/06/sdf-on-gpu.html
- https://patents.google.com/patent/US7564459B2/en
- https://medium.com/@evanwallace/easy-scalable-text-rendering-on-the-gpu-c3f4d782c5ac
- https://github.com/Chlumsky/msdfgen
- https://github.com/mapbox/tiny-sdf
- https://css-tricks.com/techniques-for-rendering-text-with-webgl/

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/11538 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Monday, 20 January 2025 08:34:19 UTC