- From: Jeffrey Yasskin <jyasskin@google.com>
- Date: Thu, 7 Nov 2019 10:19:08 -0800
- To: public-tt@w3.org
- Cc: public-privacy@w3.org
- Message-ID: <CANh-dXmQZDMpeqcF1ABk-_ThfhbfSAoSwUz4jhJk=J4wU_DiVw@mail.gmail.com>
Hi timed text folks, The PING asked me to look over a couple of the TTML drafts from a privacy perspective. I knew very little about TTML before reading the specs, so I wanted to send you my thoughts over email so you could help filter out anything silly before I file github issues. I think all of these are just things that should be added to Privacy Consideration sections in the specs, or notes around the relevant features, rather than changes to functionality. Timed Text Markup Language 2 (TTML2) (2nd Edition) https://w3c.github.io/ttml2/index.html, reviewing "Working Draft 06 November 2019" - It appears that TTML isn't implemented in browsers (except a small profile in IE and old Edge), so nothing is exposed through it that isn't exposed to the website in other ways, like Javascript or CSS. The below comments apply to platforms that do implement TTML natively. Fingerprinting: - A user's preference for how fast they consume media (e.g. 2x vs 2.5x) is probably a fingerprinting vector. - The request for timed text indicates the user's language and that the user wants captions or subtitles. - If a TTML Profile Document is published from a UA, that's probably a fingerprinting vector, although also probably equivalent to the UA string. I'm not sure this would actually happen. - There's a tts:fontFamily attribute, which could expose the system's fonts and should use the same restrictions as CSS. - The <audio> and <image> elements probably allow the server to detect the value of any <condition> expression. Many of the condition-functions seem to be already exposed by CSS media queries. There are lots of features that <conditions> can inspect, which seem like they'll be redundant with the UA string, but I might have missed one that exposes a user preference or device attribute. - Many initial style values are defined by the specification, so they wouldn't reveal anything. However, <tts:color> and some others are described as implementation-dependent. - More generally, anything that's "implementation dependent" might be a fingerprinting vector. - ttp:clockMode==local probably reveals the local time zone, if only by the timing of embedded resource requests. Also ttp:timeBase==clock reveals clock skew in the same way. - I don't see a way to pull out a display frame rate, but I probably missed it. Other information about a user: - Asking for captions/subtitles *doesn't* guarantee that the user has a permanent disability, as abled folks often want captions in noisy environments or when they don't want to disturb their neighbors. TTML Profiles for Internet Media Subtitles and Captions 1.2 https://w3c.github.io/imsc/imsc1/spec/ttml-ww-profiles.html - This defines two profiles of TTML2, so the privacy properties of TTML2 apply. I didn't look in detail whether either profile removed enough of TTML2 that any possible information exposure goes away. TTML Live explainer https://w3c.github.io/tt-module-live/tt-live-1/guide/tt-live-guide.html I looked over the NOTE rather than the full spec. - This explainer looks entirely internal to a stream author's process, so only the authors might have privacy concerns. - The system can send a "ebuttm:documentCreationDate" which would expose clock skew, which can be a fingerprinting mechanism. It's unlikely to be a privacy problem in this context. - A subtitler might be a human who doesn't want their identity to be leaked into the final output. To avoid that, their sequenceIdentifier and authorsGroupIdentifier probably shouldn't include any PII. This isn't called out in https://w3c.github.io/tt-module-live/tt-live-1/spec/tt-live.html#ebuttp-sequenceIdentifier-attr . TTML Live Carriage over WebSocket https://w3c.github.io/tt-module-live/tt-live-1/spec/carriage/WebSocket/tt-live-carriage-WebSocket.html - The specification should probably use wss:// to guarantee that the web socket connection is encrypted. (Unless there's something I don't know about web sockets.) - Embedding the sequence identifier in the wss:// URL's authority exposes it to the DNS system, which is often inspected by ISPs. The other examples of constructing URLs only expose the organization, which is less likely to contain anything private. Sequence identifiers may also just be UUIDs, in which case they might not be private either. Jeffrey
Received on Thursday, 7 November 2019 18:19:29 UTC