Re: DPV semantics: how to specify values?

Hi Harsh, I see your point, and I saw it coming :-)

since the goal of the encoding is relevant in choosing the right 
formalization - as you explain in your message - wouldn't it make sense 
to provide a small set of alternative encoding styles, for the different 
purposes mentioned in your message? so that people do not have to 
backtrack after discovering that the encoding they adopted is not 
appropriate for their purpose.  A set of guidelines should gently lead 
the user towards the correct formalization style.

Best,
Piero

On 27/05/20 10:02, Harshvardhan J. Pandit wrote:
> Dear all,
> IMHO the issue of how best to represent DPV examples is highly dependant 
> on its eventual use.
> For example, Piero's email argues the use of OWL classes as best suited 
> for automated compliance checking using the subsumption mechanism (as in 
> SPECIAL).
> 
> However, to provide a diverse perspective on this - I am raising the 
> possibility that DPV may not always be used for automated compliance 
> checking or with an OWL2 reasoner. An adopter could use it in a purely 
> declarative manner - for example to document or log or specify data 
> processing. In such cases, OWL(2) syntaxes are less understandable in 
> comparison to RDFS - by virtue of tooling and mapping to other paradigms 
> (E.g. object-oriented or graphs).
> 
> Further, we started the inception of DPV with the goal of providing a 
> vocabulary - which is why we have RDFS classes instead of OWL2. And I 
> think we should not deviate from that - or risk alienating potential 
> adopters with too much semantic complexity.
> 
> That being said, I agree with the previous emails that blank nodes are 
> *NOT* suitable - but if someone wants to use them, it won't be invalid 
> and they can do whatever they want in their use-case. For example,
> 
> _ dpv:hasProcessing [ a dpv:Collect ] .
> 
> SELECT ?processing WHERE {
>    ?x dpv:hasProcessing/rdf:type ?processing .
> }
> 
> Convoluted - for sure, but not invalid.
> 
> Intuitively, I see people not familiar with semantic web modelling their 
> data as:
> 
> _ dpv:hasProcessing dpv:Collect .
> 
> which is what they mean e.g. in consent requests. This is valid in OWL2, 
> as Victor pointed out, and again a bit frowned upon, but still not 
> invalid (in RDFS and OWL2).
> This is also the same issue when dealing with personal data categories 
> since all of them are classes rather than instances.
> 
> Personally, I like the OWL2 subsumption as the 'cleanest' indication of 
> what we mean - but am wary of using OWL2 due to expressitivty, 
> complexity, and more importantly - modelling challenges for adopters.
> If they are invested in and are using OWL2 reasoners/tooling - super! 
> They can model everything in OWL2.
> But if someone is only using DPV for declarative documentation of data 
> then OWL2 is probably not what they use.
> Hence would recommend sticking with RDFS as much as possible - if we can 
> agree on how to do it.
> 
> As for Piero's note on OWL2 notation and syntax - I agree it is very 
> convenient, and something we can look at in parallel.
> 
> Regards,
> Harsh
> 
> P.S. How is ODRL dealing with this issue?
> Did they declare all concepts as instances? e.g. use, collect are 
> instances of odrl:Action and be done with it?
> Would doing that have solved our problem?
> E.g. collect, store, use are instances of dpv:Processing but not 
> subclasses; and then collect, store, use can have subclass relationships 
> between themselves.
> 
> On 27/04/2020 16:38, Piero Bonatti wrote:
>> Dear all,
>>
>> during the last call I have been asked to resume the discussion on the 
>> best way of encoding consent and data requests.
>>
>> Below you can find a list of 4 possible approaches, with some pros and 
>> cons, discussed with the goal of automated compliance checking in mind.
>>
>> Please comment on the alternatives, share your preferences, and point 
>> out possible drawbacks (including non-technical aspects, that I 
>> deliberately leave out of the list).
>> And, of course, feel free to suggest your own approach.
>>
>> For your convenience, I have also included the examples from the 
>> previous messages.
>>
>> Best regards
>> Piero
>>
>> PS: my personal preference so far is for approach 3 below, that in my 
>> opinion is the most uniform and clean of all four.
>>
>> --------------------------------
>>
>> APPROACH 1
>>
>> This is the approach circulated in previous messages. The specific 
>> example consists of a consent and a data request, encoded in RDFS as 
>> follows:
>>
>> ex:consentPatient1 a dpv:Consent ;
>> dpv:hasDataSubject ex:patient1 ;
>> dpv:hasPurpose     [a dpv:AcademicResearch];
>> dpv:hasProcessing  [a dpv:Collect];
>> dcterms:title      "Consent for Health data analysis in a clinical 
>> study ..." ;
>> dpv:hasDataController  ex:hospital1;
>> dpv:haRecipient    ex:physiotherapist1;
>> dpv:hasPersonalDataCategory [a dpv:PhysicalHealth].
>>
>>
>> ex:dataRequest a dpv:PersonalDataHandling ;tell us
>>  dpv:hasDataSubject     ex:patient1 ;
>>  dpv:hasPurpose         [a dpv:AcacemicResearch] ;
>>  dpv:hasProcessing      [a dpv:Collect];
>>  dpv:hasLegalBasis   [a dpv:Consent];
>>  dpv:hasDataController  ex:hospital1;
>>  dpv:haRecipient     ex:physician3;
>>  dpv:hasPersonalDataCategory [a dpv:PhysicalHealth];
>>  dcterms:title          "Personal Data Collection for clinical study ..."
>>
>> The main drawback of this approach is that ex:consentPatient1 says (in 
>> English) that ex:patient1 consents to some processing, for some 
>> purpose, over some data category, that are all unspecified, because 
>> they are expressed with blank nodes.
>> Consequently, consent and data request are logically unrelated, 
>> because the blank nodes in the consent and those in the data request 
>> may denote different individuals.
>>
>> Thus compliance checking cannot be reduced to any form of logical 
>> reasoning between the two graphs. In order to check compliance, one 
>> needs an ad-hoc notion of matching (that must be justified for 
>> correctness and completeness from scratch).  It is not clear whether 
>> the ad-hoc matching algorithm can be implemented on top of the 
>> standard reasoning tools.
>>
>> The above problem can be solved by making consent a *class* of 
>> objects; then compliance can be reduced to checking whether the data 
>> request is contained in the consent - which can be reduced to standard 
>> reasoning tasks, see below.
>>
>> APPROACHES 2, 3
>>
>> In these two approaches, consent is an OWL2 class. Among the standard 
>> alternative syntax of OWL2, Manchester syntax is probably the simplest 
>> so far. In Manchester syntax a consent class would look like this:
>>
>> (hasDataSubject some {ex:patient1}
>>  and (hasPurpose some AcacemicResearch)
>>  and (hasPersonalDataCategory some PhysicalHealth)
>>  and (hasProcessing some Collect)
>>  and (hasRecipient some {ex:physician3})
>>  ...)
>>
>> The above expression covers the class of *all* processing activities 
>> of type Collect (no matter how data is concretely collected), on some 
>> physical health data (it may involve blood pressure, heartbeat 
>> frequency, etc), for the purpose of some kind of academic research (be 
>> it medical, biological, ...), whose results are shared with 
>> x:physician3.  Which is what a direct translation into English would say.
>>
>> Manchester syntax is general enough to cover all OWL constructs; for 
>> compliance checking a more streamlined JSON-like syntax may be enough, 
>> e.g.:
>>
>> {
>> hasDataSubject: {ex:patient1}
>> hasPurpose: AcacemicResearch
>> hasPersonalDataCategory: PhysicalHealth
>> hasRecipient:
>> ...
>> }
>>
>> Such syntax only needs a well-specified mapping into OWL2 that gives 
>> it a formal semantics and a logical meaning.
>>
>> Now approaches 2 and 3 differ in the representation of data requests.
>>
>> In APPROACH 2, data requests are still expressed as RDFS nodes (as in 
>> APPROACH 1). Then compliance checking can be reduced to instance 
>> checking (i.e. whether the data request is an instance of consent).
>>
>> In APPROACH 3, data requests are expressed as classes, with the same 
>> syntax as consent. In this case, compliance checking can be reduced to 
>> subsumption (i.e. checking whether the data request class is contained 
>> in the consent class).
>>
>>
>> APPROACH 4
>>
>> A class may also be expressed as a SPARQL query (the answer is the 
>> class). Data requests are as in approaches 1 and 2.
>>
>> The above consent could be expressed as a SPARQL query selecting all 
>> objects with  hasDataSubject=ex:patient1, hasPurpose in 
>> AcacemicResearch, etc.
>>
>> ex:dataRequest is compliant iff it belongs to the query answer.
>>
>> My personal feeling is that expressing consent via a SPARQL query 
>> introduces lots of irrelevant stuff and is too operational.
>>
>>
>>
> 

Received on Wednesday, 27 May 2020 15:07:33 UTC