Re: Mixing classes and instances from Piero Bonatti on 2022-01-18 (public-dpvcg@w3.org from January 2022)

From: Piero Bonatti <pieroandrea.bonatti@unina.it>
Date: Tue, 18 Jan 2022 17:50:57 +0100
To: public-dpvcg@w3.org
Message-ID: <80ab4810-9785-08e4-14bb-9d446a9529c4@unina.it>
Hallo Rinke,

A few answers from my side:

On 18/01/22 16:12, Hoekstra, Rinke (ELS-AMS) wrote:
> Secondly, I do not understand the choice to model all of the categories 
> as classes. What are the intended instances of these classes?

A few examples may clarify:
The instances of class "Location" are specific points on the earth 
surface, e.g. expressed with coordinates.

The instances of "mac address" are the concrete mac addresses of 
specific devices

The instances of "email address" are the specific email addresses of the 
data subjects

and so on

The intuition is the following: When a privacy policy says that the data 
being processed is "location" then it says that the actual data crunched 
by the application may potentially involve any coordinates where the 
data subject may happen to be. When a privacy policy says that the data 
being processed is "mac address" then it says that the actual data 
crunched by the application may potentially involve the mac address of 
any device that the data subject may happen to use.  This is indeed the 
kind of statements that a privacy policy is expected to contain - it 
would certainly not say "I'm going to to use only the location you are 
at now" or "just the mac address of the old notebook you use today and 
you are about to replace soon"...

> 
> I can see a discussion related to this topic took place at the Nov 2020 
> meeting [2], but the outcome seemed to be more around removing 
> domain/range restrictions so that the solution around the issue above, 
> as proposed by Victor (:wave:) in e.g. [3] gets hidden under the carpet 
> (Victor suggested that the range of e.g. dpv:hasProcessing is a blank 
> node that is an instance of dpv:Collect). Yes, that’s ugly [4], and I 
> agree with Rob’s suggestion here to use SKOS or instances and enumerated 
> classes. I think Harsh also supports this in his emails [5].
> 
> The arguments against this appear to be around inferencing, but I don’t 
> see what inferencing task is served by modeling these categories as classes.

The inferencing task is *compliance checking* (of a privacy policy with 
respect to the consent of a data subject, or with the GDPR).

with the class-based approach, each policy is simply the class of 
operations that it authorizes.  Policy P complies with policy Q if and 
only if the class P is contained in the class Q (i.e. every operation 
authorized by P is also authorized by Q).  You can use standard 
reasoners for compliance checking, and get correctness (no false 
positives) and completeness (no false negatives) for free, because the 
semantics of policies is exactly OWL2's direct semantics of classes.

Note that "policy" here means any of: the privacy policy of the 
controller (or its record of processing); the consent of the user; (a 
formalization of) the objective fragment of the GDPR.  So, with the 
above method, you can check compliance of the privacy policy with both 
consent and the GDPR and get strong "mathematical" guarantees on the 
reliability of the method.

Differently, with the instance-based approaches (including those using 
blank nodes) any pair of different graphs are logically unrelated with 
each other. There is no correspondence between compliance checking and 
logical inferences over the RDF graphs, even if the graphs are logical 
theories in disguise. You have to define and justify an ad-hoc algorithm 
for compliance checking, and argue (how?) that it does the right thing 
and returns no wrong answers.

Example: Suppose a consent statement authorizes some processing of the 
data subject's "account identifier"s, while the privacy policy says that 
the data being processed are "financial account number"s. you would 
expect the privacy policy to comply with that consent because "financial 
account number" is a special case of "account identifier", so consent 
"covers" the privacy policy.  This is what you actually get if "account 
identifiers" and "financial account number" are classes. However, if 
"account identifiers" and "financial account number" are instances and 
the policies are two RDF graphs (i.e. instances themselves), then the 
two RDF graphs/policies have no logical relationships with each other, 
and you can't use RDF semantics to tell whether the privacy policy is 
compliant. You have to re-invent, justify, and validate a compliance 
checking method from scratch, without any linkage to RDF's semantics 
(and without any support from it).

> 
> For instance, if I look at the Primer [1] (don’t know how up-to-date 
> this is), there is an example about AcmeMarketing:
> 
> ex:AcmeMarketing a dpv:PersonalDataHandling ;
> 
>              dpv:hasPersonalDataCategory dpv:EmailAddress ;
> 
>              dpv:hasProcessing dpv:Collect, dpv:Use ;
> 
>              dpv:hasPurpose dpv:Marketing ;
> 
>              dpv:hasDataController ex:Acme .

The above is a natural example of a policy that needs both classes and 
instances.  Unfortunately, RDF can't clearly say which is a class and 
which is an instance.

With the class-based approach, instance-valued properties can be 
expressed with singleton classes, when needed, i.e. one can say that the 
data controller belongs to the class that contains only Acme (in OWL2 
this is expressed with ObjectOneOf( Acme)).  This is equivalent to 
saying that the data controller is precisely Acme.  In this way you get 
full expressiveness, i.e. the advantages of both classes and instances.

At the same time - by using larger classes - you can model joint data 
controllers, or you can give a same consent to a class of related 
controllers (these are just two examples of the possible use of general 
classes as data controller specifications).

If you are interested in a simple and natural JSON syntax that supports 
both classes and instances, please see:

Piero A. Bonatti, Luigi Sauro, Jonathan Langens:
Representing Consent and Policies for Compliance. EuroS&P Workshops 
2021: 283-291

Such JSON dialect is just a handy "external" representation for OWL2 
classes, and gives you all the expressiveness you need, in a 
developer-friendly way.

Regards,

Piero
Received on Tuesday, 18 January 2022 16:51:47 UTC