- From: Piero Bonatti <pieroandrea.bonatti@unina.it>
- Date: Fri, 3 Apr 2020 10:28:14 +0200
- To: "Harshvardhan J. Pandit" <me@harshp.com>, Data Privacy Vocabularies and Controls Community Group <public-dpvcg@w3.org>
Dear all, let me comment on the available options. According to SPECIAL's approach the cleanest representation should be based on classes. That is, each consent and each personal data request should be a *class*. The following is an example of consent (where I use Manchester syntax for OWL2 and omit namespaces for brevity) (hasDataSubject some {patient1}) and (hasPurpose some AcademicResearch) and (hasProcessing some Collect) and ... This is the class of *all* objects whose properties satisfy the following restrictions: hasDataSubject = patient1, and hasPurpose is an instance of AcademicSearch, and hasProcessing is an instance of Collect, and... In this way, patient1 consents to *any* data collection processing whose purpose is *any* AcademicResearch purpose, etc., which is exactly what is meant by "consenting to data collection for academic research". Differently, the encoding below, proposed in [1], that is based on RDF, gives consent only for *some unspecified* instance of AcademicResearch and *some unspecified instance* of Collection, etc., where the unspecified instances are represented by blank nodes: ex:consentPatient1 a dpv:Consent ; dpv:hasDataSubject ex:patient1 ; dpv:hasPurpose [a dpv:AcademicResearch]; dpv:hasProcessing [a dpv:Collect]; dcterms:title "Consent for Health data analysis in a clinical study ..." ; dpv:hasDataController ex:hospital1; dpv:haRecipient ex:physiotherapist1; dpv:hasPersonalDataCategory [a dpv:PhysicalHealth]. The consequence of consenting to unspecified instances only is that given any request for personal data, semantically speaking it is not possible to tell whether it complies with consent, because nothing in the above RDF graph says that the blank nodes occurring in the above consent properties are indeed the blank nodes occurring in the request. Let's clarify using the example of data request from [1]: ex:dataRequest a dpv:PersonalDataHandling ; dpv:hasDataSubject ex:patient1 ; dpv:hasPurpose [a dpv:AcacemicResearch] ; dpv:hasProcessing [a dpv:Collect]; dpv:hasLegalBasis [a dpv:Consent]; dpv:hasDataController ex:hospital1; dpv:haRecipient ex:physician3; dpv:hasPersonalDataCategory [a dpv:PhysicalHealth]; dcterms:title "Personal Data Collection for clinical study ..." The nodes [a dpv:AcacemicResearch] in the data request and in the consent *in general refer to different individuals* (the same holds for [a dpv:Collect]), therefore RDF's semantics always says that compliance cannot be proved, because there is no logical relationships between ex:dataRequest and ex:consentPatient1. Of course one might write his or her own matching algorithm, that assumes the blank nodes in consent to be *the* instance(s) occurring in the data request. However in my opinion this is an abuse of semantic languages: it is like using the correct syntax but a different, nonstandard and ad-hoc semantics. If we use classes, instead, compliance is just the good old SubClassOf relation. As a consequence, the RDF-based approach prevents the use of any semantic tools that do reasoning and query answering (because they would be based on a different semantics). The difficulties in encoding consent and requests appropriately are a symptom of the fact that RDFS is instance-oriented, while the best way of formalizing consent and requests is by means of classes. Classes can be encoded with any of the possible alternative syntax for OWL2 (I recommend Manchester syntax, that is particularly lightweight). In order to simplify syntax further, one could define an equivalent of JSON-LD for encoding classes, with a clear, unambiguous translation into OWL2. Some examples based on a preliminary (and currently incomplete) proposal can be found in Piero A. Bonatti, Sabrina Kirrane: Big Data and Analytics in the Age of the GDPR. BigData Congress 2019. Here is an example of JSON-like class, using the old vocabularies developed by SPECIAL (it can easily be reformulated with the new vocabularies of DPVCG): { has_purpose: SocialNetworking, has_data: LocationData, has_processing: Transfer, has_recipient: DataSubjFriends, has_storage: { has_location: EU, has_duration: [1year,5year] } } It represents the class (in Manchester syntax): (has_purpose some SocialNetworking) and (has_data some LocationData) and (has_processing some Transfer) and (has_recipient some DataSubjFriends) and (has_storage some ( (has_location some EU) and (has_duration integer [>=1year, <=5year]) ) Perhaps the devlopment of a JSON-like encoding of OWL2 classes for consent and data requests could be an interesting topic for DPVCG. Best regards, and apologies for such a long message Piero > > In providing examples, how should we advocate use of the vocabulary? > 1) blank nodes -> dpv:hasProcessing [a dpv:Collect]; > I assume this arises is from the property's range value which is taken > to require an instance of dpv:Processing, and therefore the creation of > a blank node. > I do not think this is a good design pattern simply because it leaves > blank nodes with no purpose other than to satisfy the range is an > instance of a class semantics. I presume this is also not how people > would think about processing - one is likely to go processing is "Collect". > > 2) specify classes -> dpv:hasProcessing dpv:Collect; > I like that this is much cleaner and what someone would actually want to > indicate, but does not seem to satisfy range is an instance of > dpv:hasProcessing condition (note: it doesn't violate it either). > > This question has also been raised to me at various points, especially > by those who are not well versed in semantic web (including me!). > And in working on the Primer, it would be good to have this clarified in > the examples. > > [1] Personal Data Privacy Semantics in Multi-Agent Systems Interactions > Davide Calvaresi, Michael Schumacher, and Jean-Paul Calbimonte > University of Applied Sciences and Arts Western Switzelrand (HES-SO) > https://www.researchgate.net/profile/Davide_Calvaresi/publication/340137395_Personal_Data_Privacy_Semantics_in_Multi-Agent_Systems_Interactions/links/5e7b3e3a4585152fc0ecbc2a/Personal-Data-Privacy-Semantics-in-Multi-Agent-Systems-Interactions.pdf > > > Regards,
Received on Friday, 3 April 2020 08:31:24 UTC