Re: Streamable fonts and Privacy

> On Jul 24, 2019, at 1:04 PM, Myles C. Maxfield <mmaxfield@apple.com> wrote:
> 
> 
> 
>> On Jul 24, 2019, at 12:17 PM, Levantovsky, Vladimir <Vladimir.Levantovsky@monotype.com> wrote:
>> 
>> Hi Myles, all,
>> 
>> Thank you for raising the privacy issue, this is usually something that can be easily overlooked and only comes to attention in the very end of the REC process. 
>> However, I am wondering of a severity of your concern, and I am trying to gauge how much effort we, as WG need to dedicate to this. 
>> You wrote:
>> "A naive solution to the streamable fonts problem would have the browser request exactly the characters/glyphs that are present on the page. However, this is unfortunate because it makes it fairly easy for the server to reverse-engineer the contents of webpages, thereby creating a privacy violation."
>> 
>> I am not sure how easy it really is to reverse-engineer a text content from a glyph subset. For sake of discussion, let's consider this dialog and the incremental font subsets (in parenthesis) that would be produced to render each additional entry:
>> -  Where is the Main street? ( ?MWaehinrst)
>> - Turn right at the next light (Tglux)
>> - Thank you (koy)
>> 
>> It doesn't seem obvious (or fairly easy) to reverse-engineer the content, and it gets progressively difficult for larger size text chunks and with each incremental update, as every new content piece requires less and less glyph updates.
> 
> In Chinese, each character is used much less frequently than characters are used in English.

Sorry, this probably wasn’t very clear.

In English, if you deduplicate every character an entire news article, you wouldn’t have much more information than “the article is written in the Latin script.” However, if you did the same exercise in Chinese, you would have a pretty good understanding of the topic of the article, and possibly even the position the author has on the topic. Leaking the topics & substance of articles users read on the Web is a privacy violation.

> 
>> 
>> Do you believe the severity of this problem is potentially high, or is it something we should consider as a lower priority issue?
> 
> It is absolutely high.
> 
>> 
>> Thank you,
>> Vlad
>> 
>> -----Original Message-----
>> From: mmaxfield@apple.com <mmaxfield@apple.com> 
>> Sent: Tuesday, July 23, 2019 5:41 PM
>> To: w3c-webfonts-wg (public-webfonts-wg@w3.org) <public-webfonts-wg@w3.org>
>> Cc: Tess O'Connor <hober@apple.com>
>> Subject: Streamable fonts and Privacy
>> 
>> Hi!
>> 
>> We’ve already gotten started with plenty of discussions around streamable fonts, but I think there’s a topic which hasn’t been discussed yet.
>> 
>> In the current state-of-the-art, the browser downloads a font file if any character is within the supported unicode-range. In general, this makes it difficult for a server to reverse-engineer the contents of a page. Of course, it is possible to construct a malicious set of @font-face rules which maps each character to a different font, but that isn’t the common case on the Web today, and it has significant negative user impact like flashing a random set of letters as the page is loading, and disabling all shaping. The distinction is relevant when the service hosting the font files is different from the service hosting the CSS.
>> 
>> A naive solution to the streamable fonts problem would have the browser request exactly the characters/glyphs that are present on the page. However, this is unfortunate because it makes it fairly easy for the server to reverse-engineer the contents of webpages, thereby creating a privacy violation. This is even worse for dynamic content; not only would the server know exactly what the user was typing, but also the speed and time that each character was pressed.
>> 
>> Instead, any solution to the streamable fonts problem should require a browser to request more than it needs in an effort to mask the content of the page. It is possible that a solution where the server interprets such a request and sends even more data than the browser asked for could be wasting a significant amount of data in the response. We should model this in our evaluation of the various approaches to solve this problem.
>> 
>> Thanks,
>> Myles

Received on Wednesday, 24 July 2019 20:09:34 UTC