Re: [AGENDA] W3C Credentials Community Call - 18 July 2017 12pm ET

On Fri, Jul 14, 2017 at 4:31 PM, Kim Hamilton Duffy <kim@learningmachine.com
> wrote:

> 4. Data Minimization and Selective Disclosure (20 minutes)
> - Christopher and Jan to facilitate discussion
> - Discuss topics, Q&A
>

 Jan has some slides, which I’ve posted at
https://drive.google.com/open?id=0B8UHtBOakwo8cDg1M3JjRDBqUmM

On my side, which we don’t have time to report out, are some thoughts on
data minimization.

Here is some of my research so far on the the requirements for the most
basic of the items “data minimization”.

What are its best practices? Best tactics? No easy answers.

NIST and GDPR says we have to do it, but not a lot of requirements.

=====

NIST SP 800-63 Digital Identity Guidelines
https://pages.nist.gov/800-63-3/

https://pages.nist.gov/800-63-3/sp800-63a/sec8_privacy.html
### 8.1 Collection and Data Minimization
Section 4.2 requirement 2 permits the collection of only the PII necessary
to validate the existence of the claimed identity and associate the claimed
identity to the applicant, based on best available practices for
appropriate identity resolution, validation, and verification. Collecting
unnecessary PII can create confusion regarding why information not being
used for the identity proofing service is being collected. This leads to
invasiveness or overreach concerns, which can lead to loss of applicant
trust. Furthermore, PII retention can become vulnerable to unauthorized
access or use. Data minimization reduces the amount of PII vulnerable to
unauthorized access or use, and encourages trust in the identity proofing
process.

http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-63c.pdf

9.3 Data Minimization
Federation enables the data exposed to an RP to be minimized — resultantly,
the subscriber’s privacy is enhanced. Although an IdP may collect
additional attributes beyond what the RP requires for its use case, only
those attributes that were explicitly requested by the RP are to be
transmitted by the IdP. In some instances, an RP does not require a full
value of an attribute. For example, an RP may need to know whether the
subscriber is over 13 years old, but has no need for the full date of
birth. To minimize collection of potentially sensitive PII, the RP may
request an attribute reference (e.g., Question: Is the subscriber over 13
years old? Response: Y/N or Pass/Fail). This minimizes the RP’s collection
of potentially sensitive and unnecessary PII. Accordingly, Section 7.3
requires the RP to, where feasible, request attribute references rather
than full attribute values. To support this RP requirement IdPs are, in
turn, required to support attribute references.

----

http://www.lewik.org/term/13593/data-minimisation-principle-gdpr/

Data minimisation principle (GDPR)
Definition
Personal data shall be:

(c) Adequate, relevant and limited to what is necessary in relation to the
purposes for which they are processed
(‘data minimisation’);
Source law
General Data Protection Regulation
Chapter II, Article 5, paragraph 1

---

https://www.privacy-regulation.eu/en/5.htm

1. Personal data shall be:
(a) processed lawfully, fairly and in a transparent manner in relation to
the data subject ('lawfulness, fairness and transparency');
=> Article: 6, 9
(b) collected for specified, explicit and legitimate purposes and not
further processed in a manner that is incompatible with those purposes;
further processing for archiving purposes in the public interest,
scientific or historical research purposes or statistical purposes shall,
in accordance with Article 89(1), not be considered to be incompatible with
the initial purposes ('purpose limitation');
=> Article: 26
(c) adequate, relevant and limited to what is necessary in relation to the
purposes for which they are processed ('data minimisation');
(d) accurate and, where necessary, kept up to date; every reasonable step
must be taken to ensure that personal data that are inaccurate, having
regard to the purposes for which they are processed, are erased or
rectified without delay ('accuracy');
=> Article: 16
(e) kept in a form which permits identification of data subjects for no
longer than is necessary for the purposes for which the personal data are
processed; personal data may be stored for longer periods insofar as the
personal data will be processed solely for archiving purposes in the public
interest, scientific or historical research purposes or statistical
purposes in accordance with Article 89(1) subject to implementation of the
appropriate technical and organisational measures required by this
Regulation in order to safeguard the rights and freedoms of the data
subject ('storage limitation');
(f) processed in a manner that ensures appropriate security of the personal
data, including protection against unauthorised or unlawful processing and
against accidental loss, destruction or damage, using appropriate technical
or organisational measures ('integrity and confidentiality').
=> Article: 24, 32
2. The controller shall be responsible for, and be able to demonstrate
compliance with, paragraph 1 ('accountability').
=> Article: 77, 82, 83


==========

The best source I’ve found so far (sent by Jan) are some what is desirable
or undesireable in data minimization:

=========

Pfitzmann, A & Hansen M — "A terminology for talking about privacy by data
minimization: Anonymity, Unlinkability, Undetectability, Unobservability,
Pseudonymity, and Identity Management"
http://dud.inf.tu-dresden.de/literatur/Anon_Terminology_v0.34.pdf

**Anonymity** of a subject from an attacker’s perspective means that the
attacker cannot sufficiently identify the subject within a set of subjects,
the anonymity set.

VS. **Identifiability** of a subject from an attacker’s perspective means
that the attacker can sufficiently identify the subject within a set of
subjects, the identifiability set.

---
**Unlinkability** of two or more items of interest (IOIs, e.g., subjects,
messages, actions, ...) from an attacker’s perspective means that within
the system (comprising these and possibly other items), the attacker cannot
sufficiently distinguish whether these IOIs are related or not.

VS. **Linkability** of two or more items of interest (IOIs, e.g., subjects,
messages, actions, ...) from an attacker’s perspective means that within
the system (comprising these and possibly other items), the attacker can
sufficiently distinguish whether these IOIs are related or not.

---

**Undetectability** of an item of interest (IOI) from an attacker’s
perspective means that the attacker cannot sufficiently distinguish whether
it exists or not.

VS. **Detectability** of an item of interest (IOI) from an attacker’s
perspective means that the attacker can sufficiently distinguish whether it
exists or
not.

---

**Unobservability** of an item of interest (IOI) means
* undetectability of the IOI against all subjects uninvolved in it and
* anonymity of the subject(s) involved in the IOI even against the other
subject(s) involved in that IOI.

VS. **Observability** of an item of interest (IOI) means:
<many possibilities to define the semantics>.


========


— Christopher Allen

Received on Tuesday, 18 July 2017 15:44:50 UTC