Re: DPV semantics: how to specify values?

Dear all,
IMHO the issue of how best to represent DPV examples is highly dependant 
on its eventual use.
For example, Piero's email argues the use of OWL classes as best suited 
for automated compliance checking using the subsumption mechanism (as in 
SPECIAL).

However, to provide a diverse perspective on this - I am raising the 
possibility that DPV may not always be used for automated compliance 
checking or with an OWL2 reasoner. An adopter could use it in a purely 
declarative manner - for example to document or log or specify data 
processing. In such cases, OWL(2) syntaxes are less understandable in 
comparison to RDFS - by virtue of tooling and mapping to other paradigms 
(E.g. object-oriented or graphs).

Further, we started the inception of DPV with the goal of providing a 
vocabulary - which is why we have RDFS classes instead of OWL2. And I 
think we should not deviate from that - or risk alienating potential 
adopters with too much semantic complexity.

That being said, I agree with the previous emails that blank nodes are 
*NOT* suitable - but if someone wants to use them, it won't be invalid 
and they can do whatever they want in their use-case. For example,

_ dpv:hasProcessing [ a dpv:Collect ] .

SELECT ?processing WHERE {
   ?x dpv:hasProcessing/rdf:type ?processing .
}

Convoluted - for sure, but not invalid.

Intuitively, I see people not familiar with semantic web modelling their 
data as:

_ dpv:hasProcessing dpv:Collect .

which is what they mean e.g. in consent requests. This is valid in OWL2, 
as Victor pointed out, and again a bit frowned upon, but still not 
invalid (in RDFS and OWL2).
This is also the same issue when dealing with personal data categories 
since all of them are classes rather than instances.

Personally, I like the OWL2 subsumption as the 'cleanest' indication of 
what we mean - but am wary of using OWL2 due to expressitivty, 
complexity, and more importantly - modelling challenges for adopters.
If they are invested in and are using OWL2 reasoners/tooling - super! 
They can model everything in OWL2.
But if someone is only using DPV for declarative documentation of data 
then OWL2 is probably not what they use.
Hence would recommend sticking with RDFS as much as possible - if we can 
agree on how to do it.

As for Piero's note on OWL2 notation and syntax - I agree it is very 
convenient, and something we can look at in parallel.

Regards,
Harsh

P.S. How is ODRL dealing with this issue?
Did they declare all concepts as instances? e.g. use, collect are 
instances of odrl:Action and be done with it?
Would doing that have solved our problem?
E.g. collect, store, use are instances of dpv:Processing but not 
subclasses; and then collect, store, use can have subclass relationships 
between themselves.

On 27/04/2020 16:38, Piero Bonatti wrote:
> Dear all,
>
> during the last call I have been asked to resume the discussion on the 
> best way of encoding consent and data requests.
>
> Below you can find a list of 4 possible approaches, with some pros and 
> cons, discussed with the goal of automated compliance checking in mind.
>
> Please comment on the alternatives, share your preferences, and point 
> out possible drawbacks (including non-technical aspects, that I 
> deliberately leave out of the list).
> And, of course, feel free to suggest your own approach.
>
> For your convenience, I have also included the examples from the 
> previous messages.
>
> Best regards
> Piero
>
> PS: my personal preference so far is for approach 3 below, that in my 
> opinion is the most uniform and clean of all four.
>
> --------------------------------
>
> APPROACH 1
>
> This is the approach circulated in previous messages. The specific 
> example consists of a consent and a data request, encoded in RDFS as 
> follows:
>
> ex:consentPatient1 a dpv:Consent ;
> dpv:hasDataSubject ex:patient1 ;
> dpv:hasPurpose     [a dpv:AcademicResearch];
> dpv:hasProcessing  [a dpv:Collect];
> dcterms:title      "Consent for Health data analysis in a clinical 
> study ..." ;
> dpv:hasDataController  ex:hospital1;
> dpv:haRecipient    ex:physiotherapist1;
> dpv:hasPersonalDataCategory [a dpv:PhysicalHealth].
>
>
> ex:dataRequest a dpv:PersonalDataHandling ;tell us
>  dpv:hasDataSubject     ex:patient1 ;
>  dpv:hasPurpose         [a dpv:AcacemicResearch] ;
>  dpv:hasProcessing      [a dpv:Collect];
>  dpv:hasLegalBasis   [a dpv:Consent];
>  dpv:hasDataController  ex:hospital1;
>  dpv:haRecipient     ex:physician3;
>  dpv:hasPersonalDataCategory [a dpv:PhysicalHealth];
>  dcterms:title          "Personal Data Collection for clinical study ..."
>
> The main drawback of this approach is that ex:consentPatient1 says (in 
> English) that ex:patient1 consents to some processing, for some 
> purpose, over some data category, that are all unspecified, because 
> they are expressed with blank nodes.
> Consequently, consent and data request are logically unrelated, 
> because the blank nodes in the consent and those in the data request 
> may denote different individuals.
>
> Thus compliance checking cannot be reduced to any form of logical 
> reasoning between the two graphs. In order to check compliance, one 
> needs an ad-hoc notion of matching (that must be justified for 
> correctness and completeness from scratch).  It is not clear whether 
> the ad-hoc matching algorithm can be implemented on top of the 
> standard reasoning tools.
>
> The above problem can be solved by making consent a *class* of 
> objects; then compliance can be reduced to checking whether the data 
> request is contained in the consent - which can be reduced to standard 
> reasoning tasks, see below.
>
> APPROACHES 2, 3
>
> In these two approaches, consent is an OWL2 class. Among the standard 
> alternative syntax of OWL2, Manchester syntax is probably the simplest 
> so far. In Manchester syntax a consent class would look like this:
>
> (hasDataSubject some {ex:patient1}
>  and (hasPurpose some AcacemicResearch)
>  and (hasPersonalDataCategory some PhysicalHealth)
>  and (hasProcessing some Collect)
>  and (hasRecipient some {ex:physician3})
>  ...)
>
> The above expression covers the class of *all* processing activities 
> of type Collect (no matter how data is concretely collected), on some 
> physical health data (it may involve blood pressure, heartbeat 
> frequency, etc), for the purpose of some kind of academic research (be 
> it medical, biological, ...), whose results are shared with 
> x:physician3.  Which is what a direct translation into English would say.
>
> Manchester syntax is general enough to cover all OWL constructs; for 
> compliance checking a more streamlined JSON-like syntax may be enough, 
> e.g.:
>
> {
> hasDataSubject: {ex:patient1}
> hasPurpose: AcacemicResearch
> hasPersonalDataCategory: PhysicalHealth
> hasRecipient:
> ...
> }
>
> Such syntax only needs a well-specified mapping into OWL2 that gives 
> it a formal semantics and a logical meaning.
>
> Now approaches 2 and 3 differ in the representation of data requests.
>
> In APPROACH 2, data requests are still expressed as RDFS nodes (as in 
> APPROACH 1). Then compliance checking can be reduced to instance 
> checking (i.e. whether the data request is an instance of consent).
>
> In APPROACH 3, data requests are expressed as classes, with the same 
> syntax as consent. In this case, compliance checking can be reduced to 
> subsumption (i.e. checking whether the data request class is contained 
> in the consent class).
>
>
> APPROACH 4
>
> A class may also be expressed as a SPARQL query (the answer is the 
> class). Data requests are as in approaches 1 and 2.
>
> The above consent could be expressed as a SPARQL query selecting all 
> objects with  hasDataSubject=ex:patient1, hasPurpose in 
> AcacemicResearch, etc.
>
> ex:dataRequest is compliant iff it belongs to the query answer.
>
> My personal feeling is that expressing consent via a SPARQL query 
> introduces lots of irrelevant stuff and is too operational.
>
>
>

-- 
---
Harshvardhan Pandit
ADAPT Centre
Trinity College Dublin

Received on Wednesday, 27 May 2020 08:02:39 UTC