Re: DPV-TECH extention published from Harshvardhan J. Pandit on 2022-06-17 (public-dpvcg@w3.org from June 2022)

From: Harshvardhan J. Pandit <me@harshp.com>
Date: Fri, 17 Jun 2022 09:59:21 +0100
To: David Lewis <delewis@tcd.ie>, public-dpvcg@w3.org
Message-ID: <2218dbad-35e3-06b2-68ed-cb7b4a03a7ed@harshp.com>
Hi Dave. Thanks for the comments. My replies are inline.

On 17/06/2022 08:59, David Lewis wrote:
> A quick comment on the technology subject definition :
> 
> "TechnologySubject and hasSubject is the subject of the technology i.e. 
> whom the technology is used on or subjected to. This may be directly 
> (e.g. person within a CCTV camera's vision) or indirectly (e.g. person 
> whose details were used as training data)"
> 
> Breaking this down, I think we need to be more precise definition of 
> being 'subjected to' technology. I
> 
> If you parse this as the option ""TechnologySubject and hasSubject is 
> the subject of the technology i.e. whom the technology [is used on or] 
> subjected to." that essentially a circular definitions, so doesn't tell 
> you much.

I don't think the definition is necessarily circular but I agree it can 
always be further refined/clarified. What is considered "subject" or 
"subjected" to can vary based on legal or other interpretations, so 
there is no good existing complete definition AFAIK.

The concept of "subject" here mirrors that of "data subject" - i.e. who 
is the individual the data is about, similarly who is the individual on 
whom (or on whose data) the technology is applied. I think there will be 
a few iterations to smooth out the definitions.

> 
> If you parse it as: "TechnologySubject and hasSubject is the subject of 
> the technology i.e. whom the technology is used on [or subjected to.]", 
> that's a bit better but might leave the reader still wondering what 
> classifies as 'used on'.

In the introduction, there are some hints: "This may be directly (e.g. 
person within a CCTV camera's vision) or indirectly (e.g. person whose 
details were used as training data)."

> 
> Further, neither of these to my mind necessarily imply case that your 
> data is used for training. In this case this it is more that the 
> technology is built with your data (which is perhaps sufficiently 
> captured by the GDPR definition of data subject) but doesn't necessarily 
> imply that the technology is used 'on' you, or even that your are ever 
> 'subjected to' the technology.

Not all training data automatically denotes someone to become a (data or 
technology) subject IMO. If all that was collected was my height with no 
further individual identifiers, I am not automatically a data subject in 
some use of that data becuase there needs to be some identifiability. So 
the notion of subject can be quite complex - it can be technology used 
on you, with you, about you, etc. etc.

> 
> So I might suggest
> 
> i) define 'subjected to' instead as 'affected or potentially affected by 
> the technology' (note this wording is in part inspired by the general 
> definition of 'stakeholder' in ISO

This is NOT the "subject" but rather an "affected individual" or 
"stakeholder (ISO)" - different concepts. So it would not be correct to 
use that definition becuase it goes beyond the use of technology and 
into the realm of figuring out impacts of effects - which is not the 
intention of the modelling actors in technologies.

Semantically, the notion of "affected or impacted" is separate from that 
of "subject" IMHO. Because someone can be affected without ever being a 
subject - such as due to secondary effects or unrealised impacts, strong 
example being use of shared geneteics.

Also not all those who are affected need to be the subjects (depending 
on the definition of subject). Same as under GDPR the affected 
individuals can be other than data subjects. The final nail would be 
defining "affected" - and I think this could mean the organisation 
developing or providing technology and the people that work there could 
also potentially end up being "subjects" - which would be wrong to. So 
you see how this takes the notion of "subject" closer to "stakeholder" - 
and I argue that they are separate.

All the impacts, risks, etc. notions are in main DPV and should be used 
from there. This is just for describing the technologies used. I 
wouldn't like to duplicate the impacts/risks etc. again in an extesion. 
So for indicating who is affected, I think the data subject and 
risk/impact sections in DPV should be used.

> 
> ii) consider treating actors whose data is used for training separately 
> somehow, e.g. by just relying on the existing 'data subject' definition.

Yes, the introduction text does state this possibility as: "This may be 
directly (e.g. person within a CCTV camera's vision) or indirectly (e.g. 
person whose details were used as training data). What is considered a 
subject may be contextually dependant on the nature and scope of the 
technology as well as its application. In the future, we may separate 
this concept for further distinction between direct and indirect 
subjects (or use alternate terms) - if such categorisation is deemed 
beneficial in the description of individuals subjected to technologies."

> 
> I acknowledge that the implication of these suggestions is that if a 
> person's data is fully anonymised and used for ML training and the 
> resulting technology does not affect or potentially affect that person, 
> then they would fall out of the definition of 'technology subject'. But 
> I think that's probably OK.

Yes, this is what is intended. Note that the concept "technology 
subject" is complimentary to "data subject" - so that person would be 
the data subject in the training phase. At some point in the future, 
providing these concepts (i.e. training data, training phase) would be 
something to consider. But for now my intention is to have DPV v1 in 
terms of data protection / privacy with priority.

Regards,
-- 
---
Harshvardhan J. Pandit, Ph.D
Research Fellow
ADAPT Centre, Trinity College Dublin
https://harshp.com/
Received on Friday, 17 June 2022 08:59:40 UTC