follow-up re privacy review comments on Trace Context Level 1

Hi Distributed Tracing WG and PING (Privacy Interest Group),

Following up on a privacy review and discussion that we had in November 2018: Trace Context is now a W3C Recommendation.
https://www.w3.org/TR/trace-context-1/ <https://www.w3.org/TR/trace-context-1/>

My apologies that this follow-up is after the PR comment/vote period; when I started this update it was still a Proposed Recommendation. I believe PING should take away a lesson here of tracking follow-ups for specifications where we have conducted privacy reviews so that we are confirming that issues are properly addressed and that we can re-check specifications prior to late-stage transitions and their associated deadlines. (This was also noted on last week’s PING teleconference.)

The feedback below should still be useful for recording Errata and in future revisions.

Cheers,
Nick

During our previous discussion (minutes are available here: https://www.w3.org/2018/11/29-privacy-minutes.html#item00 <https://www.w3.org/2018/11/29-privacy-minutes.html#item00> ), PING folks had the following questions:

1. who is the intended implementer and does this need to be a W3C web standard?

This spec is intended for vendors who operate distributed tracing systems (or operate distributed systems and want to use or connect to distributed tracing systems) and this protocol doesn’t rely on any changes from web browsers, although it’s possible that client-side JavaScript could use these headers as part of a distributed tracing system. The motivation is to encourage interop among several vendors in this space that are using HTTP and Web technologies, although it’s also noted that this could apply beyond HTTP.

Reviewing specs of this kind is a little less common for PING, since we’re most often looking at APIs or protocols that involve web site and web browser behaviors. The WG notes in their README that their intention is to ask for exceptions to CORS limitations on these headers, which would rely on web browsers treating these trace headers differently from other headers when making cross-origin requests, but that isn’t mentioned in this specification.

2. what information may be revealed in these standardized identifier headers and who will have access to that information?

Risks of tracking across origins/systems and information disclosure are noted in both the privacy and security considerations sections, although in some cases risks are minimized and mitigations are unspecified or discouraged.

3. can the intelligence-free nature of these identifiers be confirmed or audited by external parties?

Mitigations are mentioned for consumers of these fields to try to audit the randomness of identifiers, and the spec provides normative requirements with justification for why identifiers should be chosen as globally random.


Privacy and Security Considerations sections are both present in the spec. The Privacy Considerations and Security Considerations sections raised several questions and concerns for me in reviewing their updated states.

> Note that these privacy concerns of the traceparent field are theoretical rather than practical.

This sentence seems to be incorrect and unhelpful. The documented privacy risks in the previous paragraphs seem entirely feasible, with particular normative requirements to mitigate them; “theoretical rather than practical” implies that they would not happen in any anticipated use, which does not seem to be the case. While it’s not uncommon for a privacy considerations section to consist of arguments for why described privacy risks are not serious (and discouraging use of known mitigations), it’s not clear that these arguments are helpful to implementers or end users.

Similarly, these normative requirements are potentially in conflict and appear to discourage mitigation of identified privacy risks:

> Vendors extremely sensitive to personal information exposure MAY implement selective removal of values corresponding to the unknown keys. Vendors SHOULD NOT mutate the tracestate field, as it defeats the purpose of allowing multiple tracing systems to collaborate.

Is removing values from the tracestate field allowed or prohibited?

And it’s not clear how a potential implementer should determine whether it’s “extremely sensitive to personal information”: is that different from being compliant with binding legal requirements in multiple jurisdictions? Maybe a better, less editorialized phrasing would just be to say that "Vendors MAY remove values in order to limit disclosure of personal information.”

> Vendors should ensure that they include only these response headers when responding to systems that participated in the trace.

I think “only” is misplaced and the intended sentence is:
> [potential correction:] Vendors should ensure that they include these response headers only when responding to systems that participated in the trace.

Is this a normative requirement SHOULD? Throughout the Privacy Considerations section, I’m often uncertain as to when these terms are used for normative requirements and when not.

The Security Considerations section also has many statements that use RFC 2119 terms, but none are in all caps and it’s not clear if these are intended as normative requirements or not.

In Section 7.1, “requeest” should be “request”.

A few sections of the document are marked as non-normative, but then their subsections include many direct normative requirements. The Problem Statement and Solution sections (which read to me as non-normative overviews) are not marked as non-normative. I don’t really have any strong feelings about use of non-normative section headers, I’m just passing on that the distinctions in the current spec are confusing.

Received on Thursday, 13 February 2020 22:36:32 UTC