Proposed changes to DPV to include Non-Personal Data

Hi.

This email sets out the impact of changing core concepts and relations 
within the DPV, currently being discussed in the context of integrating 
the Data Governance Act (DGA). Given below are 4 options for how DPV can 
represent non-personal data and their impacts and adoption 
considerations. Please indicate your preference or objections by 
replying to this email or at https://github.com/w3c/dpv/issues/99. 
Decisions will only be taken in the meeting calls.

---

# Summary of Discussion

1) Current structure of DPV - The 'core' concepts in DPV all relate to 
personal data. For example, Purpose is defined as the purpose for 
processing of personal data, Processing is defined as the operation on 
personal data, and PersonalDataHandling is defined as the 'handling' or 
process regarding processing of personal data. Through this, DPV can 
express information about how personal data is being used within an 
use-case.

2) Limitations - Since all concepts are regarding personal data, they 
cannot be used where other non-personal data is involved. For example, 
Technical Measures such as encryption are applicable for non-personal 
data, but the DPV concept is defined for its use over personal data. 
Similarly, Processing and Purpose are also generic terms that apply to 
both personal and non-personal data, but in DPV we have defined them as 
being only about personal data.

3) DGA's scope involves both Personal and Non-Personal Data - To 
simplify the Act, it sets up portals where datasets can be found and 
reused. If the data is personal, then GDPR applies and mechanisms such 
as consent and pseudonymisation can be involved. If the data is 
non-personal, licenses and copyright can be involved. A commonality 
between both is describing the purposes of processing that data e.g. 
what the consent or license permits or limits, or how the data must be 
processed e.g. storage conditions such as location or temporal 
limitation or technical measures such as access control and encryption.

4) Required changes in DPV for DGA - The personal data related concepts 
are well established within DPV and not much needs to change other than 
considering some new types of entities and measures. The non-personal 
data concepts are completely absent. To be able to model the DGA (and 
other initiatives like it) - DPV would need to have concepts that can 
address both personal and non-personal data. This represents a 
significant expansion of scope in terms of DPVCG.

---

Question 1: Should the scope of DPV be made broader to encompass 
personal as well as non-personal data, with the focus remaining on 
responsible use of personal data?
- This decision must be determined by the group. The only advantage for 
including non-personal data as we have discussed so far is related to DGA.
- We have had people who are interested in doing this work, and so far I 
have not registered any objections.

---

Question 2: Assuming the answer to Q1 is Yes, what options are available 
to add non-personal data concepts and what are its implications?

Option 1: we change the core properties of DPV to represent both 
personal and non-personal data i.e. Purpose becomes "purpose for 
processing of 'data'" and Processing becomes "operations on 'data'" 
rather than 'personal data'. Personal Data will have parent 'Data' and 
sibling 'NonPersonalData' concepts. Legal Basis will be distinguished as 
'Legal Basis for Personal Data' and 'Legal Basis for Non-Personal Data'. 
The relations, e.g. hasPurpose, will also change accordingly.

- the implications of these are that the change in concepts means 
anything that is using these will be impacted e.g. existing adopters and 
use-cases will see their work being changed
- to enable choice and control over such major changes, the version 
number should be increased to 2 and a separate namespace/URI e.g. 
w3id.org/dpv/v2
- this is the best choice in terms of simplicity of information 
modelling as it keeps the total concepts lower by reusing the same 
concept (e.g. Purpose) for personal and non-personal data.
- Where necessary, existing concepts will be split into variants for 
Personal and Non-Personal Data. E.g. Legal Basis as above, Technical and 
Organisational Measures where relevant - e.g. encryption is applicable 
to both but anonymisation only applies to personal data

Option 2: we do not change anything in the current set of concepts, and 
instead create a separate set of concepts for non-personal data - 
similar to an extension. E.g. non-pd:Purpose would be the purpose for 
processing non-personal data, non-pd:LegalBasis would be the legal basis 
for non-personal data, and so on.

- this option does not impact any existing adoption or use-case for DPV 
as no concepts in DPV are being changed, and hence there is no change in 
namespace/URI
- this is not a good design choice in terms of information modelling as 
it duplicates the concepts for each of personal and non-personal data - 
however this can be justified with the above reason for not impacting 
existing users as well as there being significant different in concepts 
to have them defined separately
- this is not 'attractive' to use because the concepts are separated in 
two sets, which means the users cannot just say 'Purpose' but will have 
to specify whether it is from the 'Personal' or 'Non-Personal' vocabularies.
- this also means each concept may need to be duplicated across personal 
and non-personal variants e.g. encryption will have to be defined twice.

Option 3: we do not change anything, and discard the proposal

Option 4: redefine DPV to "Generic Data Processing Vocabulary" which is 
about any data so that there is no continuity in terms of concepts. This 
means we redesign DPV from scratch and make any changes as necessary - 
which is effectively Option 1 without the implied changes for existing 
users. A new namespace/URI is required e.g. w3id.org/gdpv. Drawback is 
that 'DPV' will no longer be maintained and all users will need to move 
to the new vocabulary.

---

My thoughts: My answer to Q1 regarding whether we include non-personal 
data is - yes, but we only do Option 1 for the top-concepts and not 
create a comprehensive vocabulary for non-personal data. This is because 
I am in EU, am interested in DGA, but not interested in non-personal 
data aspects such as contracts and licensing. However, I see value in 
allowing DPV to be expanded to enable others to use it and expand on it 
for this while keeping the scope of DPVCG limited to personal data.

Separately, I also am thinking about DPV in terms of changing how the 
vocabulary is current structured and named in the Github repo, e.g. 
instead of folders named /dpv-gdpr, /dpv-dga, etc. we have sensible 
structuring as: /loc/eu/gdpr, /loc/eu/dga, /loc/eu/ie and so on. 
Similarly, dpv-pd becomes just pd, dpv-legal and dpv-tech become legal 
and tech, and so on. This is not connected to the above, but since we 
are discussing changes to DPV, I am mentioning this in the same context.

Regards,
-- 
---
Harshvardhan J. Pandit, Ph.D
Assistant Professor
ADAPT Centre, Dublin City University
https://harshp.com/

Received on Wednesday, 5 July 2023 07:35:18 UTC