Re: Privacy reviews of a couple TTML specs

Any thoughts from the TTML side? Should I just file all of these as github
issues?

Thanks,
Jeffrey

On Thu, Nov 7, 2019 at 10:19 AM Jeffrey Yasskin <jyasskin@google.com> wrote:

> Hi timed text folks,
>
> The PING asked me to look over a couple of the TTML drafts from a privacy
> perspective. I knew very little about TTML before reading the specs, so I
> wanted to send you my thoughts over email so you could help filter out
> anything silly before I file github issues. I think all of these are just
> things that should be added to Privacy Consideration sections in the specs,
> or notes around the relevant features, rather than changes to
> functionality.
>
>
> Timed Text Markup Language 2 (TTML2) (2nd Edition)
> https://w3c.github.io/ttml2/index.html, reviewing "Working Draft 06
> November 2019"
>
>
>    - It appears that TTML isn't implemented in browsers (except a small
>    profile in IE and old Edge), so nothing is exposed through it that isn't
>    exposed to the website in other ways, like Javascript or CSS. The below
>    comments apply to platforms that do implement TTML natively.
>
> Fingerprinting:
>
>    - A user's preference for how fast they consume media (e.g. 2x vs
>    2.5x) is probably a fingerprinting vector.
>    - The request for timed text indicates the user's language and that
>    the user wants captions or subtitles.
>    - If a TTML Profile Document is published from a UA, that's probably a
>    fingerprinting vector, although also probably equivalent to the UA string.
>    I'm not sure this would actually happen.
>    - There's a tts:fontFamily attribute, which could expose the system's
>    fonts and should use the same restrictions as CSS.
>    - The <audio> and <image> elements probably allow the server to detect
>    the value of any <condition> expression. Many of the condition-functions
>    seem to be already exposed by CSS media queries. There are lots of features
>    that <conditions> can inspect, which seem like they'll be redundant with
>    the UA string, but I might have missed one that exposes a user preference
>    or device attribute.
>    - Many initial style values are defined by the specification, so they
>    wouldn't reveal anything. However, <tts:color> and some others are
>    described as implementation-dependent.
>    - More generally, anything that's "implementation dependent" might be
>    a fingerprinting vector.
>    - ttp:clockMode==local probably reveals the local time zone, if only
>    by the timing of embedded resource requests. Also ttp:timeBase==clock
>    reveals clock skew in the same way.
>    - I don't see a way to pull out a display frame rate, but I probably
>    missed it.
>
> Other information about a user:
>
>    - Asking for captions/subtitles *doesn't* guarantee that the user has
>    a permanent disability, as abled folks often want captions in noisy
>    environments or when they don't want to disturb their neighbors.
>
>
>
> TTML Profiles for Internet Media Subtitles and Captions 1.2
> https://w3c.github.io/imsc/imsc1/spec/ttml-ww-profiles.html
>
>    - This defines two profiles of TTML2, so the privacy properties of
>    TTML2 apply. I didn't look in detail whether either profile removed enough
>    of TTML2 that any possible information exposure goes away.
>
> TTML Live explainer
> https://w3c.github.io/tt-module-live/tt-live-1/guide/tt-live-guide.html
>
> I looked over the NOTE rather than the full spec.
>
>    - This explainer looks entirely internal to a stream author's process,
>    so only the authors might have privacy concerns.
>    - The system can send a "ebuttm:documentCreationDate" which would
>    expose clock skew, which can be a fingerprinting mechanism. It's unlikely
>    to be a privacy problem in this context.
>    - A subtitler might be a human who doesn't want their identity to be
>    leaked into the final output. To avoid that, their sequenceIdentifier and
>    authorsGroupIdentifier probably shouldn't include any PII. This isn't
>    called out in
>    https://w3c.github.io/tt-module-live/tt-live-1/spec/tt-live.html#ebuttp-sequenceIdentifier-attr
>    .
>
> TTML Live Carriage over WebSocket
>
> https://w3c.github.io/tt-module-live/tt-live-1/spec/carriage/WebSocket/tt-live-carriage-WebSocket.html
>
>    - The specification should probably use wss:// to guarantee that the
>    web socket connection is encrypted. (Unless there's something I don't know
>    about web sockets.)
>    - Embedding the sequence identifier in the wss:// URL's authority
>    exposes it to the DNS system, which is often inspected by ISPs. The other
>    examples of constructing URLs only expose the organization, which is less
>    likely to contain anything private. Sequence identifiers may also just be
>    UUIDs, in which case they might not be private either.
>
>
> Jeffrey
>

Received on Monday, 11 November 2019 19:31:02 UTC