Re: [private-measurement] Questions about Collusion - What can be done about the collusion risk in these systems? (#12)

This is a threat model question that is dependent somewhat on the nature of the solution we choose.  It is irrelevant if we decide that the information we produce can be entrusted to sites without any intermediation or conditions.  To be clear, my view is that that outcome is unequivocally unacceptable, but it is one that I have heard expressed so it is worth stating.

The starting point for any threat model is the goal of the system and the motives and incentives of various actors who we might consider to be adversarial to those goals.  The goal of this system is to protect information about the browsing activity of web users, but to provide sufficient information that advertisers, publishers, and those they work with are able to understand how advertising performs.  What "sufficient" means here is going to be a good deal of what this group eventually spends time on, but anything short of revealing all data requires us to consider the question of collusion.

Adversaries in this case are those that seek to obtain more information than the system is designed to reveal.  I hope that it isn't necessary to articulate precisely why an adversary might find this information worth seeking.  Adversaries might seek to collude with other actors in the system to gain access to this information.  If we design a system that depends on certain actors gaining access to more information than what is sufficient, then those actors are a potential target for collusion.

When we say collusion, we might also consider the risk that an actor might be subject to compromise or attack.  Software, hardware, and systems are invariably imperfect.  An attack that leads to access to information - whether covert or not - looks little different to collusion.  The strategies we might employ to manage collusion risk also need to consider vulnerabilities.

For instance, we might decide to build a centralized service that receives the browsing history of all users and sticks it in a big database.  This service is responsible for providing whatever reduced portion of this information we decide is acceptable.  An adversary that convinces the operator of the service to collude might then be able to access all this data.  The collusion risk here is somewhat extreme, which makes this sort of design unlikely to be acceptable, even if it weren't totally unacceptable for multiple reasons.

### User Agents

Any system we design has a collusion risk for endpoints.  That is, user agents and the system on which they run.  It is unavoidable that user agents are exposed to every detail of browsing activity.

We generally don't consider user agents as a high risk for several reasons; see also [RFC 8890](https://www.rfc-editor.org/rfc/rfc8890.html#section-4.2).

Collusion with user agents on an instance-by-instance basis is unlikely to provide returns that are commensurate with the costs.  The same applies to targeted compromise of end systems, though the potential for mass exploitation of a shared vulnerability is real.  Here though, leaking browsing history is probably of less value to an attacker than other information that might be gained (passwords and credit card numbers spring to mind).

The incentive structure is such that a systemic collusion on the part of user agent vendors or operating system vendors is challenging, though it is not something that can be totally ignored.  The "[fiduciary responsibility](https://www.w3.org/TR/privacy-principles/#user-agents)" aren't generally subject to legal enforcement, so collusion on the part of user agents is largely down to market forces: that is, people choosing an alternative browser.

With most browsers being open source, it is tempting to suggest that any sort of systemic collusion would be detectable.  This would make the reputational risk of collusion high.  However, there are supply chain challenges in browser deployment that are vulnerable to attack.  The same challenges might be exploited by a malicious browser vendor to target some or all of their users.

The final thing to say about user agents is that even if browsers act as honest agents for users, we cannot assume that all browsers are run by genuine and honest users.  Browsers are freely available.  It is trivial to obtain and run a browser - or something that might appear to be a browser.  Any threat model needs to consider this.  If an attack only depends on running multiple browser instances - even at relatively large scale - we have to consider that to be feasible.

### Web Sites

In general, we might consider web sites to be adversarial.  Sites are the primary recipients of the information we intend to provide and are broadly incentivized to obtain more information than the system is designed to reveal.

New sites are trivial to create with costs that are little different to new browser instances.  Any attack that depends on creating sites - including having users visit those sites - has to be considered as feasible.

### MPC Servers

Any multi-party computation (MPC) system consists of multiple servers that are trusted not to collude to some extent.  The specific MPC design will determine the extent to which servers need to be trusted and what is gained from collusion.

The approximate reference frame we seem to have adopted thus far has been one based on experience with Prio deployments.  That is, absent collusion, servers learn very limited information about what they are processing.  A malicious servers is able to spoil the outputs of the system, but they cannot learn additional information by doing so.

It is hard to avoid servers learning some information, but this can be limited to things that are not related to what is being protected.  For instance, servers learn when submissions are made, but we can make the choice of whether to make a submission carries no information that we consider to be private; failing that, additional steps are taken to disassociate the submission from any action that might be private.

In some cases, the information that an MPC server is exposed to, while less than complete, is greater than the information than the system might be designed to produce. For example, in the Poplar MPC where an incremental DPF is used, servers are exposed to intermediate values so that they might exclude portions of the search space.  Though these intermediate values might have differential privacy noise applied, the effect of exposure to intermediate values over time might be a reduction in the overall effectiveness of the protection offered by differential privacy.  When there is some amount of information leakage like this, collusion of one MPC server with an adversary is all that is necessary.

Collusion between MPC servers in this model carries a much high cost.  Most of the systems proposed thus far completely lose any privacy guarantees if two servers collude.  Two colluding servers are - in most cases - able to recover all data that is input to the system as a whole.  This requires care in terms of how the system is constructed, but different structural features can minimize this risk.  For instance, some MPC designs provide covert security, where defection on the part of one actor can be detected by other honest participants. Providing covert security could be in tension with other goals, like cost, so it might not be something that can be provided.

We can mitigate the effect of a total failure by limiting the information that is input to the system.  Any information that does not contribute toward the goals of the system - either its outputs or its safe operation - should not be provided.  To give a trivial example, if the system only operates on the domain name of sites, it should accept domain names and not full URLs.

Reducing any incentive to collude is probably the most effective means of managing collusion risk.  Entities that operate servers will be chosen based on the perception that they are trustworthy.  Part of determining suitability will be based on an assessment of what they stand to gain from collusion - or lose from being found to collude.  If there is no direct incentive for server operators to violate that trust, in particular, if they stand to gain little from obtaining the information they have been entrusted with, that will strengthen that assessment.

A lot of this comes down to a determination of how likely it is that collusion will be discovered. As there are no artifacts produced that might provide public accountability in the event of collusion, a lot depends on the server operator and their practices. Auditing of MPC server software, hardware, and operations is potentially a useful tool in ensuring that compliance with security practices is maintained.  This doesn't prevent system collusion, but might ensure that practices are structured in a way that reduces the likelihood that an attempt at collusion can be concealed.

As noted, a lot of this comes down to security practices as much as it comes down to trust.  For the secure operation of a service, we have some understanding of the risks and challenges involved.  Software and hardware supply chain issues are important considerations that need policies and procedures.  Operational practices need to be implemented to protect against accidents, infiltration, or malicious action.  These cover change management practices such as code review and approval practices, but extend to how access to keying material or physical systems is controlled, logged, and overseen.

Diversity in the supply chain can be important.  For immediate suppliers, those that provide the software or hardware that is in operating an MPC service, additional constraints might be justified.  For instance, diversity in implementation of the specific MPC algorithms used might be necessary to mitigate the risk that a single supplier could be compromised.  The same diversity requirement might apply to cloud services, if those are used.

Further up the supply chain, attacks have a greater chance of being discovered as a result of the product being used by more diverse consumers.  Reusing existing components increases the reliability of the system and reduces its cost.  For reused components, it might be reasonable to say that the reputational risk for a supplier is sufficient that additional controls are less necessary.  Unless an attempt to attack the MPC system were able to be precisely targeted, all other consumers of that component are all potentially able to discover the attack.  Note also that a request for compliance on an upstream supplier (such as a vendor of CPUs or operating systems, to give examples) might not carry sufficient weight to warrant special treatment, so this has a pragmatic aspect.

For operational practices, the best tool we have for ensuring compliance is auditing.  Audits aren't magical, but they introduce practices to an independent, experienced, and adversarial mindset.  Security audits are well established business and given the extent to which collusion is indistinguishable from attack (especially in insider attack), these are likely to be a useful tool in managing collusion risk.

### Trusted Execution Environments

The operation of a TEE is not fundamentally dissimilar to an MPC.  In a loose fashion the operation of a TEE can be considered a multi-party system where one party defines the particulars of how the system operates by providing code and a second party (the silicon vendor) guarantees that only that code can be executed.  Due to the nature of the TEE threat model, where an attacker is assumed to be unable physically access hardware, the operator of the hardware (likely a cloud provider) is a third entity that is involved.  

In [some proposals](https://github.com/WICG/conversion-measurement-api/blob/main/AGGREGATION_SERVICE_TEE.md), yet another entity exists that only releases the keying material necessary to access private information after validating that an accepted TEE in an accepted cloud service is executing accepted software.  Though a single entity in this role would have access to keying material, this function can be constructed with threshold keys or N-of-M secret sharing so that secrets are not ever available to a single entity.  All the considerations for an MPC might apply to such a function.

Unlike MPC, where the system provides some privacy guarantees if a single actor colludes with an attacker, the three active entities in a TEE-based system need to be trusted as any one of these actors can access private information.  However, as the the silicon vendor and cloud operator are higher up the supply chain, we might rely on established practices and safeguards regarding things like the operation of TEEs in general or physical access to hardware.  Again, a break here affects more systems than ours, so the stakes involved in a failure are much higher.

Worth noting here is the incentive structure.  As it stands, all major cloud service operators also have a non-trivial stake in the business of advertising.  Whether this interest in browsing history extends to a willingness to risk losing trust in a major and fundamental product probably weighs more to the side of keeping their cloud business, but it's a point that depends somewhat on the risk of getting caught.

That leaves the trusted code.  All of the same supply chain concerns apply here, though without any similar recourse to diversity of implementation or redundancy.  Failures in this code are final, whereas with an MPC it takes multiple failures to compromise the system.

This makes oversight of the practices used to produce this code more important.  TEEs provide some assurances that you are able to know what code was executed, but the code that executes might not match the software as written.  Ensuring this is truly the case requires additional controls over the practices used to write and manage source code, any dependencies, plus the process that is used to construct the final software binary/image (see also binary transparency, reproducible builds, and [Ken Thompson](https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf)).

The same diversity arguments apply to the implementation of the TEE.  Though different cloud services might choose different silicon vendors, any given run will likely execute using silicon from just one vendor on one cloud.  If an attacker is able to choose where code is executed (as proposed) then they can choose a colluding entity.

### Role of Governance

This entire discussion ties in with our discussion on governance.  How we might - collectively, as a browser vendor, or individual user - decide that an operator is trustworthy might rest on the governance structures that are created around this.  It is possible that different rules could emerge from diverse user communities, but that presents challenges to the coherent functioning of the overall system.

A unified governance structure carries a lot more weight and is more able to compel or encourage compliance.  More importantly, a unified governance structure properly places those who build web sites ahead of those who build web browsers.  Though it is sometimes unpleasant to agree to a single standard for how the web operates, failing to do so shifts the burden of compliance to sites, who find that they have to make hard decisions about which variations to support.  This inevitably leads to problems for users who choose less popular browsers.

So though I might not agree with the entirety of the [garuda proposal](https://darobin.github.io/garuda/), the idea that we might need to build such a system is something we might want to start to build out.  Such things take time.



-- 
GitHub Notification of comment by martinthomson
Please view or discuss this issue at https://github.com/patcg/private-measurement/issues/12#issuecomment-1128508205 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 17 May 2022 07:22:40 UTC