- From: Jeffrey Yasskin <jyasskin@google.com>
- Date: Mon, 11 Nov 2019 11:30:46 -0800
- To: public-tt@w3.org
- Cc: public-privacy@w3.org
- Message-ID: <CANh-dXm2X1ce_FthTti+mpqtvr=PQVz18TP2Jvom1SBteV=sJA@mail.gmail.com>
Any thoughts from the TTML side? Should I just file all of these as github issues? Thanks, Jeffrey On Thu, Nov 7, 2019 at 10:19 AM Jeffrey Yasskin <jyasskin@google.com> wrote: > Hi timed text folks, > > The PING asked me to look over a couple of the TTML drafts from a privacy > perspective. I knew very little about TTML before reading the specs, so I > wanted to send you my thoughts over email so you could help filter out > anything silly before I file github issues. I think all of these are just > things that should be added to Privacy Consideration sections in the specs, > or notes around the relevant features, rather than changes to > functionality. > > > Timed Text Markup Language 2 (TTML2) (2nd Edition) > https://w3c.github.io/ttml2/index.html, reviewing "Working Draft 06 > November 2019" > > > - It appears that TTML isn't implemented in browsers (except a small > profile in IE and old Edge), so nothing is exposed through it that isn't > exposed to the website in other ways, like Javascript or CSS. The below > comments apply to platforms that do implement TTML natively. > > Fingerprinting: > > - A user's preference for how fast they consume media (e.g. 2x vs > 2.5x) is probably a fingerprinting vector. > - The request for timed text indicates the user's language and that > the user wants captions or subtitles. > - If a TTML Profile Document is published from a UA, that's probably a > fingerprinting vector, although also probably equivalent to the UA string. > I'm not sure this would actually happen. > - There's a tts:fontFamily attribute, which could expose the system's > fonts and should use the same restrictions as CSS. > - The <audio> and <image> elements probably allow the server to detect > the value of any <condition> expression. Many of the condition-functions > seem to be already exposed by CSS media queries. There are lots of features > that <conditions> can inspect, which seem like they'll be redundant with > the UA string, but I might have missed one that exposes a user preference > or device attribute. > - Many initial style values are defined by the specification, so they > wouldn't reveal anything. However, <tts:color> and some others are > described as implementation-dependent. > - More generally, anything that's "implementation dependent" might be > a fingerprinting vector. > - ttp:clockMode==local probably reveals the local time zone, if only > by the timing of embedded resource requests. Also ttp:timeBase==clock > reveals clock skew in the same way. > - I don't see a way to pull out a display frame rate, but I probably > missed it. > > Other information about a user: > > - Asking for captions/subtitles *doesn't* guarantee that the user has > a permanent disability, as abled folks often want captions in noisy > environments or when they don't want to disturb their neighbors. > > > > TTML Profiles for Internet Media Subtitles and Captions 1.2 > https://w3c.github.io/imsc/imsc1/spec/ttml-ww-profiles.html > > - This defines two profiles of TTML2, so the privacy properties of > TTML2 apply. I didn't look in detail whether either profile removed enough > of TTML2 that any possible information exposure goes away. > > TTML Live explainer > https://w3c.github.io/tt-module-live/tt-live-1/guide/tt-live-guide.html > > I looked over the NOTE rather than the full spec. > > - This explainer looks entirely internal to a stream author's process, > so only the authors might have privacy concerns. > - The system can send a "ebuttm:documentCreationDate" which would > expose clock skew, which can be a fingerprinting mechanism. It's unlikely > to be a privacy problem in this context. > - A subtitler might be a human who doesn't want their identity to be > leaked into the final output. To avoid that, their sequenceIdentifier and > authorsGroupIdentifier probably shouldn't include any PII. This isn't > called out in > https://w3c.github.io/tt-module-live/tt-live-1/spec/tt-live.html#ebuttp-sequenceIdentifier-attr > . > > TTML Live Carriage over WebSocket > > https://w3c.github.io/tt-module-live/tt-live-1/spec/carriage/WebSocket/tt-live-carriage-WebSocket.html > > - The specification should probably use wss:// to guarantee that the > web socket connection is encrypted. (Unless there's something I don't know > about web sockets.) > - Embedding the sequence identifier in the wss:// URL's authority > exposes it to the DNS system, which is often inspected by ISPs. The other > examples of constructing URLs only expose the organization, which is less > likely to contain anything private. Sequence identifiers may also just be > UUIDs, in which case they might not be private either. > > > Jeffrey >
Received on Monday, 11 November 2019 19:31:02 UTC