Re: DPV semantics: how to specify values?

Dear all,

let me comment on the available options.

According to SPECIAL's approach the cleanest representation should be 
based on classes. That is, each consent and each personal data request 
should be a *class*. The following is an example of consent (where I use 
Manchester syntax for OWL2 and omit namespaces for brevity)

(hasDataSubject some {patient1})
  and (hasPurpose some AcademicResearch)
  and (hasProcessing some Collect)
  and ...

This is the class of *all* objects whose properties satisfy the 
following restrictions:
hasDataSubject = patient1, and
hasPurpose is an instance of AcademicSearch, and
hasProcessing is an instance of Collect, and...

In this way, patient1 consents to *any* data collection processing whose 
purpose is *any* AcademicResearch purpose, etc., which is exactly what 
is meant by "consenting to data collection for academic research".

Differently, the encoding below, proposed in [1], that is based on RDF, 
gives consent only for *some unspecified* instance of AcademicResearch 
and *some unspecified instance* of Collection, etc., where the 
unspecified instances are represented by blank nodes:

ex:consentPatient1 a dpv:Consent ;
dpv:hasDataSubject ex:patient1 ;
dpv:hasPurpose     [a dpv:AcademicResearch];
dpv:hasProcessing  [a dpv:Collect];
dcterms:title      "Consent for Health data analysis in a clinical study 
..." ;
dpv:hasDataController  ex:hospital1;
dpv:haRecipient    ex:physiotherapist1;
dpv:hasPersonalDataCategory [a dpv:PhysicalHealth].


The consequence of consenting to unspecified instances only is that 
given any request for personal data, semantically speaking it is not 
possible to tell whether it complies with consent, because nothing in 
the above RDF graph says that the blank nodes occurring in the above 
consent properties are indeed the blank nodes occurring in the request. 
Let's clarify using the example of data request from [1]:

ex:dataRequest a dpv:PersonalDataHandling ;
  dpv:hasDataSubject     ex:patient1 ;
  dpv:hasPurpose         [a dpv:AcacemicResearch] ;
  dpv:hasProcessing      [a dpv:Collect];
  dpv:hasLegalBasis   [a dpv:Consent];
  dpv:hasDataController  ex:hospital1;
  dpv:haRecipient     ex:physician3;
  dpv:hasPersonalDataCategory [a dpv:PhysicalHealth];
  dcterms:title          "Personal Data Collection for clinical study ..."


The nodes [a dpv:AcacemicResearch] in the data request and in the 
consent *in general refer to different individuals* (the same holds for 
[a dpv:Collect]), therefore RDF's semantics always says that compliance 
cannot be proved, because there is no logical relationships between 
ex:dataRequest and ex:consentPatient1.

Of course one might write his or her own matching algorithm, that 
assumes the blank nodes in consent to be *the* instance(s) occurring in 
the data request. However in my opinion this is an abuse of semantic 
languages: it is like using the correct syntax but a different, 
nonstandard and ad-hoc semantics.
If we use classes, instead, compliance is just the good old SubClassOf 
relation.

As a consequence, the RDF-based approach prevents the use of any 
semantic tools that do reasoning and query answering (because they would 
be based on a different semantics).

The difficulties in encoding consent and requests appropriately are a 
symptom of the fact that RDFS is instance-oriented, while the best way 
of formalizing consent and requests is by means of classes.

Classes can be encoded with any of the possible alternative syntax for 
OWL2 (I recommend Manchester syntax, that is particularly lightweight).
In order to simplify syntax further, one could define an equivalent of 
JSON-LD for encoding classes, with a clear, unambiguous translation into 
OWL2. Some examples based on a preliminary (and currently incomplete) 
proposal can be found in

Piero A. Bonatti, Sabrina Kirrane: Big Data and Analytics in the Age of 
the GDPR. BigData Congress 2019.

Here is an example of JSON-like class, using the old vocabularies 
developed by SPECIAL (it can easily be reformulated with the new 
vocabularies of DPVCG):

{
has_purpose: SocialNetworking,
has_data: LocationData,
has_processing: Transfer,
has_recipient: DataSubjFriends,
has_storage: {
   has_location: EU,
   has_duration: [1year,5year]
   }
}

It represents the class (in Manchester syntax):

(has_purpose some SocialNetworking)
  and (has_data some LocationData)
  and (has_processing some Transfer)
  and (has_recipient some DataSubjFriends)
  and (has_storage some (
   (has_location some EU) and
   (has_duration integer [>=1year, <=5year])
   )


Perhaps the devlopment of a JSON-like encoding of OWL2 classes for 
consent and data requests could be an interesting topic for DPVCG.

Best regards, and apologies for such a long message

Piero

> 
> In providing examples, how should we advocate use of the vocabulary?
> 1) blank nodes -> dpv:hasProcessing [a dpv:Collect];
> I assume this arises is from the property's range value which is taken 
> to require an instance of dpv:Processing, and therefore the creation of 
> a blank node.
> I do not think this is a good design pattern simply because it leaves 
> blank nodes with no purpose other than to satisfy the range is an 
> instance of a class semantics. I presume this is also not how people 
> would think about processing - one is likely to go processing is "Collect".
> 
> 2) specify classes -> dpv:hasProcessing dpv:Collect;
> I like that this is much cleaner and what someone would actually want to 
> indicate, but does not seem to satisfy range is an instance of 
> dpv:hasProcessing condition (note: it doesn't violate it either).
> 
> This question has also been raised to me at various points, especially 
> by those who are not well versed in semantic web (including me!).
> And in working on the Primer, it would be good to have this clarified in 
> the examples.
> 
> [1] Personal Data Privacy Semantics in Multi-Agent Systems Interactions
> Davide Calvaresi, Michael Schumacher, and Jean-Paul Calbimonte
> University of Applied Sciences and Arts Western Switzelrand (HES-SO)
> https://www.researchgate.net/profile/Davide_Calvaresi/publication/340137395_Personal_Data_Privacy_Semantics_in_Multi-Agent_Systems_Interactions/links/5e7b3e3a4585152fc0ecbc2a/Personal-Data-Privacy-Semantics-in-Multi-Agent-Systems-Interactions.pdf 
> 
> 
> Regards,

Received on Friday, 3 April 2020 08:31:24 UTC