Privacy reviews of a couple TTML specs

Hi timed text folks,

The PING asked me to look over a couple of the TTML drafts from a privacy
perspective. I knew very little about TTML before reading the specs, so I
wanted to send you my thoughts over email so you could help filter out
anything silly before I file github issues. I think all of these are just
things that should be added to Privacy Consideration sections in the specs,
or notes around the relevant features, rather than changes to
functionality.


Timed Text Markup Language 2 (TTML2) (2nd Edition)
https://w3c.github.io/ttml2/index.html, reviewing "Working Draft 06
November 2019"


   - It appears that TTML isn't implemented in browsers (except a small
   profile in IE and old Edge), so nothing is exposed through it that isn't
   exposed to the website in other ways, like Javascript or CSS. The below
   comments apply to platforms that do implement TTML natively.

Fingerprinting:

   - A user's preference for how fast they consume media (e.g. 2x vs 2.5x)
   is probably a fingerprinting vector.
   - The request for timed text indicates the user's language and that the
   user wants captions or subtitles.
   - If a TTML Profile Document is published from a UA, that's probably a
   fingerprinting vector, although also probably equivalent to the UA string.
   I'm not sure this would actually happen.
   - There's a tts:fontFamily attribute, which could expose the system's
   fonts and should use the same restrictions as CSS.
   - The <audio> and <image> elements probably allow the server to detect
   the value of any <condition> expression. Many of the condition-functions
   seem to be already exposed by CSS media queries. There are lots of features
   that <conditions> can inspect, which seem like they'll be redundant with
   the UA string, but I might have missed one that exposes a user preference
   or device attribute.
   - Many initial style values are defined by the specification, so they
   wouldn't reveal anything. However, <tts:color> and some others are
   described as implementation-dependent.
   - More generally, anything that's "implementation dependent" might be a
   fingerprinting vector.
   - ttp:clockMode==local probably reveals the local time zone, if only by
   the timing of embedded resource requests. Also ttp:timeBase==clock reveals
   clock skew in the same way.
   - I don't see a way to pull out a display frame rate, but I probably
   missed it.

Other information about a user:

   - Asking for captions/subtitles *doesn't* guarantee that the user has a
   permanent disability, as abled folks often want captions in noisy
   environments or when they don't want to disturb their neighbors.



TTML Profiles for Internet Media Subtitles and Captions 1.2
https://w3c.github.io/imsc/imsc1/spec/ttml-ww-profiles.html

   - This defines two profiles of TTML2, so the privacy properties of TTML2
   apply. I didn't look in detail whether either profile removed enough of
   TTML2 that any possible information exposure goes away.

TTML Live explainer
https://w3c.github.io/tt-module-live/tt-live-1/guide/tt-live-guide.html

I looked over the NOTE rather than the full spec.

   - This explainer looks entirely internal to a stream author's process,
   so only the authors might have privacy concerns.
   - The system can send a "ebuttm:documentCreationDate" which would expose
   clock skew, which can be a fingerprinting mechanism. It's unlikely to be a
   privacy problem in this context.
   - A subtitler might be a human who doesn't want their identity to be
   leaked into the final output. To avoid that, their sequenceIdentifier and
   authorsGroupIdentifier probably shouldn't include any PII. This isn't
   called out in
   https://w3c.github.io/tt-module-live/tt-live-1/spec/tt-live.html#ebuttp-sequenceIdentifier-attr
   .

TTML Live Carriage over WebSocket
https://w3c.github.io/tt-module-live/tt-live-1/spec/carriage/WebSocket/tt-live-carriage-WebSocket.html

   - The specification should probably use wss:// to guarantee that the web
   socket connection is encrypted. (Unless there's something I don't know
   about web sockets.)
   - Embedding the sequence identifier in the wss:// URL's authority
   exposes it to the DNS system, which is often inspected by ISPs. The other
   examples of constructing URLs only expose the organization, which is less
   likely to contain anything private. Sequence identifiers may also just be
   UUIDs, in which case they might not be private either.


Jeffrey

Received on Thursday, 7 November 2019 18:19:29 UTC