Re: DPV semantics: how to specify values?

Dear all,

during the last call I have been asked to resume the discussion on the 
best way of encoding consent and data requests.

Below you can find a list of 4 possible approaches, with some pros and 
cons, discussed with the goal of automated compliance checking in mind.

Please comment on the alternatives, share your preferences, and point 
out possible drawbacks (including non-technical aspects, that I 
deliberately leave out of the list).
And, of course, feel free to suggest your own approach.

For your convenience, I have also included the examples from the 
previous messages.

Best regards
Piero

PS: my personal preference so far is for approach 3 below, that in my 
opinion is the most uniform and clean of all four.

--------------------------------

APPROACH 1

This is the approach circulated in previous messages. The specific 
example consists of a consent and a data request, encoded in RDFS as 
follows:

ex:consentPatient1 a dpv:Consent ;
dpv:hasDataSubject ex:patient1 ;
dpv:hasPurpose     [a dpv:AcademicResearch];
dpv:hasProcessing  [a dpv:Collect];
dcterms:title      "Consent for Health data analysis in a clinical study 
..." ;
dpv:hasDataController  ex:hospital1;
dpv:haRecipient    ex:physiotherapist1;
dpv:hasPersonalDataCategory [a dpv:PhysicalHealth].


ex:dataRequest a dpv:PersonalDataHandling ;tell us
  dpv:hasDataSubject     ex:patient1 ;
  dpv:hasPurpose         [a dpv:AcacemicResearch] ;
  dpv:hasProcessing      [a dpv:Collect];
  dpv:hasLegalBasis   [a dpv:Consent];
  dpv:hasDataController  ex:hospital1;
  dpv:haRecipient     ex:physician3;
  dpv:hasPersonalDataCategory [a dpv:PhysicalHealth];
  dcterms:title          "Personal Data Collection for clinical study ..."

The main drawback of this approach is that ex:consentPatient1 says (in 
English) that ex:patient1 consents to some processing, for some purpose, 
over some data category, that are all unspecified, because they are 
expressed with blank nodes.
Consequently, consent and data request are logically unrelated, because 
the blank nodes in the consent and those in the data request may denote 
different individuals.

Thus compliance checking cannot be reduced to any form of logical 
reasoning between the two graphs. In order to check compliance, one 
needs an ad-hoc notion of matching (that must be justified for 
correctness and completeness from scratch).  It is not clear whether the 
ad-hoc matching algorithm can be implemented on top of the standard 
reasoning tools.

The above problem can be solved by making consent a *class* of objects; 
then compliance can be reduced to checking whether the data request is 
contained in the consent - which can be reduced to standard reasoning 
tasks, see below.

APPROACHES 2, 3

In these two approaches, consent is an OWL2 class. Among the standard 
alternative syntax of OWL2, Manchester syntax is probably the simplest 
so far. In Manchester syntax a consent class would look like this:

(hasDataSubject some {ex:patient1}
  and (hasPurpose some AcacemicResearch)
  and (hasPersonalDataCategory some PhysicalHealth)
  and (hasProcessing some Collect)
  and (hasRecipient some {ex:physician3})
  ...)

The above expression covers the class of *all* processing activities of 
type Collect (no matter how data is concretely collected), on some 
physical health data (it may involve blood pressure, heartbeat 
frequency, etc), for the purpose of some kind of academic research (be 
it medical, biological, ...), whose results are shared with 
x:physician3.  Which is what a direct translation into English would say.

Manchester syntax is general enough to cover all OWL constructs; for 
compliance checking a more streamlined JSON-like syntax may be enough, e.g.:

{
hasDataSubject: {ex:patient1}
hasPurpose: AcacemicResearch
hasPersonalDataCategory: PhysicalHealth
hasRecipient:
...
}

Such syntax only needs a well-specified mapping into OWL2 that gives it 
a formal semantics and a logical meaning.

Now approaches 2 and 3 differ in the representation of data requests.

In APPROACH 2, data requests are still expressed as RDFS nodes (as in 
APPROACH 1). Then compliance checking can be reduced to instance 
checking (i.e. whether the data request is an instance of consent).

In APPROACH 3, data requests are expressed as classes, with the same 
syntax as consent. In this case, compliance checking can be reduced to 
subsumption (i.e. checking whether the data request class is contained 
in the consent class).


APPROACH 4

A class may also be expressed as a SPARQL query (the answer is the 
class). Data requests are as in approaches 1 and 2.

The above consent could be expressed as a SPARQL query selecting all 
objects with  hasDataSubject=ex:patient1, hasPurpose in 
AcacemicResearch, etc.

ex:dataRequest is compliant iff it belongs to the query answer.

My personal feeling is that expressing consent via a SPARQL query 
introduces lots of irrelevant stuff and is too operational.

Received on Monday, 27 April 2020 15:42:47 UTC