[csswg-drafts] [css-fonts] Exploring better ways to balance privacy, i18n, design tradeoffs for local fonts (#11571)

LeaVerou has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-fonts] Exploring better ways to balance privacy, i18n, design tradeoffs for local fonts ==
## Background

Apple has restricted access to non-system local fonts in WebKit (this includes both font names in `font-family` as well as the `local()` function in `@font-face`) and does not wish to go back on this. While this is effective for curbing font-based fingerprinting (see #4055), it has raised several concerns:
- i18n: Users of certain languages are _dependent_ on local fonts to access website content in their language. These fonts are too large to be included as web fonts.
- This unfairly privileges certain scripts with few characters and effectively restricts web design in certain locales to system fonts, as web fonts are not a tractable solution for many languages, including several East Asian languages. Double-keyed caching has made that even less of a viable option.
- In many cases, web fonts are not a valid solution due to font licensing (if I have a legal copy of Adobe Caslon Pro, I should be able to view content using it without the website author or me needing a web font permitting license)
- Even for cases where web fonts are a viable solution, the carbon footprint of all these pointless repeated downloads is not negligible and in many developing countries bandwidth is very expensive, so this also privileges Western locales.

To sum up, the current solution goes against the following Ethical Web Principles:
- ["The Web is for all people"](https://www.w3.org/TR/ethical-web-principles/#allpeople)
- ["The Web is an environmentally sustainable platform"](https://www.w3.org/TR/ethical-web-principles/#sustainable)

I’m not suggesting we should just accept font-based fingerprinting, but I'm hoping we can find a better balance of tradeoffs than throwing out the baby with the bathwater. 

## Proposal

What if, instead of entirely cutting off access to local fonts we only allow N fonts per origin (with a reasonably small N, e.g. 8)? It seems that for most use cases this would be sufficient, and still minimize or even eliminate font-based fingerprinting.

**Font access would only "register" when the font is actually used**, e.g. if I specify a font stack like `Adobe Caslon Pro, Adobe Garamond, Hoefler Text, serif;` this doesn't use up 3 fonts from that origin, but only one (the one actually applied). This is important — without it the utility of this proposal is very limited.

It could be argued that then we have more bits of entropy, because if we can detect that it's Hoefler Text that is being applied we _also_ know that Adobe Caslon Pro and Adobe Garamond are _not_ installed. However, I _think_ it would be pretty hard to detect this in the general case without actually accessing the first two fonts (i.e. apply them without a fallback).

System fonts would not count in this since they do not add bits of entropy anyway. Also, this would count *families*, not *faces*. I.e. using ten weights from a font does not use up 10 fonts from the limit. This is important for websites to display properly, and only uses up a small bit of additional entropy.

**The UA would manage which local fonts have been accessed** for each origin, and this would expire after a certain period (a month? a year? more research needed about what's the shortest period that still curbs fingerprinting) so that websites don't have to be locked in to their 
original font choices until the end of time. TBD: Is this also cleared when the user clears local data? I would vote "yes".

I'm new to this debate, so it's entirely possible I might be missing something huge that cancels the entire idea, but I figured it doesn't hurt to put this idea down into an issue. Playing devil's advocate, here are some counterarguments I can think of:

- It could be argued that for certain fonts (the ones necessary for the i18n cases) simply the presence of a single font can narrow down the user quite a lot, but it seems to me that in that case, the user's locale would narrow things down just as much.
- It could be argued that then the fingerprinting detection could be split across multiple origins. There could even be a fingerprinting-as-a-service SaaS: a service that sets up hundreds of origins, each detecting 8 additional fonts, and combines the results to identify users. Websites would then just include a page from this company in an iframe, and that would in turn iframe all other origins. To prevent this scenario, there should probably be limits on how this behaves in cross-origin iframes. I could mull it over a bit more if that's the only sticking point (no point spending time on this if the entire idea is not viable).

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/11571 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Saturday, 25 January 2025 23:24:28 UTC