Re: DPV semantics: how to specify values?

Thanks Piero - I agree that we need guidelines and indications.
And my next suggestion was going to be iterating through the different 
approaches to demonstrate possible representations and where they might 
excel/suffer. E.g. OWL2 work you suggested for automatic compliance 
checking (as you have in SPECIAL)

At the same time, we need a single consistent style for smaller examples 
within the Primer and other documents.

Best,
Harsh

On 27/05/2020 16:07, Piero Bonatti wrote:
> Hi Harsh, I see your point, and I saw it coming :-)
>
> since the goal of the encoding is relevant in choosing the right 
> formalization - as you explain in your message - wouldn't it make 
> sense to provide a small set of alternative encoding styles, for the 
> different purposes mentioned in your message? so that people do not 
> have to backtrack after discovering that the encoding they adopted is 
> not appropriate for their purpose.  A set of guidelines should gently 
> lead the user towards the correct formalization style.
>
> Best,
> Piero
>
> On 27/05/20 10:02, Harshvardhan J. Pandit wrote:
>> Dear all,
>> IMHO the issue of how best to represent DPV examples is highly 
>> dependant on its eventual use.
>> For example, Piero's email argues the use of OWL classes as best 
>> suited for automated compliance checking using the subsumption 
>> mechanism (as in SPECIAL).
>>
>> However, to provide a diverse perspective on this - I am raising the 
>> possibility that DPV may not always be used for automated compliance 
>> checking or with an OWL2 reasoner. An adopter could use it in a 
>> purely declarative manner - for example to document or log or specify 
>> data processing. In such cases, OWL(2) syntaxes are less 
>> understandable in comparison to RDFS - by virtue of tooling and 
>> mapping to other paradigms (E.g. object-oriented or graphs).
>>
>> Further, we started the inception of DPV with the goal of providing a 
>> vocabulary - which is why we have RDFS classes instead of OWL2. And I 
>> think we should not deviate from that - or risk alienating potential 
>> adopters with too much semantic complexity.
>>
>> That being said, I agree with the previous emails that blank nodes 
>> are *NOT* suitable - but if someone wants to use them, it won't be 
>> invalid and they can do whatever they want in their use-case. For 
>> example,
>>
>> _ dpv:hasProcessing [ a dpv:Collect ] .
>>
>> SELECT ?processing WHERE {
>>    ?x dpv:hasProcessing/rdf:type ?processing .
>> }
>>
>> Convoluted - for sure, but not invalid.
>>
>> Intuitively, I see people not familiar with semantic web modelling 
>> their data as:
>>
>> _ dpv:hasProcessing dpv:Collect .
>>
>> which is what they mean e.g. in consent requests. This is valid in 
>> OWL2, as Victor pointed out, and again a bit frowned upon, but still 
>> not invalid (in RDFS and OWL2).
>> This is also the same issue when dealing with personal data 
>> categories since all of them are classes rather than instances.
>>
>> Personally, I like the OWL2 subsumption as the 'cleanest' indication 
>> of what we mean - but am wary of using OWL2 due to expressitivty, 
>> complexity, and more importantly - modelling challenges for adopters.
>> If they are invested in and are using OWL2 reasoners/tooling - super! 
>> They can model everything in OWL2.
>> But if someone is only using DPV for declarative documentation of 
>> data then OWL2 is probably not what they use.
>> Hence would recommend sticking with RDFS as much as possible - if we 
>> can agree on how to do it.
>>
>> As for Piero's note on OWL2 notation and syntax - I agree it is very 
>> convenient, and something we can look at in parallel.
>>
>> Regards,
>> Harsh
>>
>> P.S. How is ODRL dealing with this issue?
>> Did they declare all concepts as instances? e.g. use, collect are 
>> instances of odrl:Action and be done with it?
>> Would doing that have solved our problem?
>> E.g. collect, store, use are instances of dpv:Processing but not 
>> subclasses; and then collect, store, use can have subclass 
>> relationships between themselves.
>>
>> On 27/04/2020 16:38, Piero Bonatti wrote:
>>> Dear all,
>>>
>>> during the last call I have been asked to resume the discussion on 
>>> the best way of encoding consent and data requests.
>>>
>>> Below you can find a list of 4 possible approaches, with some pros 
>>> and cons, discussed with the goal of automated compliance checking 
>>> in mind.
>>>
>>> Please comment on the alternatives, share your preferences, and 
>>> point out possible drawbacks (including non-technical aspects, that 
>>> I deliberately leave out of the list).
>>> And, of course, feel free to suggest your own approach.
>>>
>>> For your convenience, I have also included the examples from the 
>>> previous messages.
>>>
>>> Best regards
>>> Piero
>>>
>>> PS: my personal preference so far is for approach 3 below, that in 
>>> my opinion is the most uniform and clean of all four.
>>>
>>> --------------------------------
>>>
>>> APPROACH 1
>>>
>>> This is the approach circulated in previous messages. The specific 
>>> example consists of a consent and a data request, encoded in RDFS as 
>>> follows:
>>>
>>> ex:consentPatient1 a dpv:Consent ;
>>> dpv:hasDataSubject ex:patient1 ;
>>> dpv:hasPurpose     [a dpv:AcademicResearch];
>>> dpv:hasProcessing  [a dpv:Collect];
>>> dcterms:title      "Consent for Health data analysis in a clinical 
>>> study ..." ;
>>> dpv:hasDataController  ex:hospital1;
>>> dpv:haRecipient    ex:physiotherapist1;
>>> dpv:hasPersonalDataCategory [a dpv:PhysicalHealth].
>>>
>>>
>>> ex:dataRequest a dpv:PersonalDataHandling ;tell us
>>>  dpv:hasDataSubject     ex:patient1 ;
>>>  dpv:hasPurpose         [a dpv:AcacemicResearch] ;
>>>  dpv:hasProcessing      [a dpv:Collect];
>>>  dpv:hasLegalBasis   [a dpv:Consent];
>>>  dpv:hasDataController  ex:hospital1;
>>>  dpv:haRecipient     ex:physician3;
>>>  dpv:hasPersonalDataCategory [a dpv:PhysicalHealth];
>>>  dcterms:title          "Personal Data Collection for clinical study 
>>> ..."
>>>
>>> The main drawback of this approach is that ex:consentPatient1 says 
>>> (in English) that ex:patient1 consents to some processing, for some 
>>> purpose, over some data category, that are all unspecified, because 
>>> they are expressed with blank nodes.
>>> Consequently, consent and data request are logically unrelated, 
>>> because the blank nodes in the consent and those in the data request 
>>> may denote different individuals.
>>>
>>> Thus compliance checking cannot be reduced to any form of logical 
>>> reasoning between the two graphs. In order to check compliance, one 
>>> needs an ad-hoc notion of matching (that must be justified for 
>>> correctness and completeness from scratch). It is not clear whether 
>>> the ad-hoc matching algorithm can be implemented on top of the 
>>> standard reasoning tools.
>>>
>>> The above problem can be solved by making consent a *class* of 
>>> objects; then compliance can be reduced to checking whether the data 
>>> request is contained in the consent - which can be reduced to 
>>> standard reasoning tasks, see below.
>>>
>>> APPROACHES 2, 3
>>>
>>> In these two approaches, consent is an OWL2 class. Among the 
>>> standard alternative syntax of OWL2, Manchester syntax is probably 
>>> the simplest so far. In Manchester syntax a consent class would look 
>>> like this:
>>>
>>> (hasDataSubject some {ex:patient1}
>>>  and (hasPurpose some AcacemicResearch)
>>>  and (hasPersonalDataCategory some PhysicalHealth)
>>>  and (hasProcessing some Collect)
>>>  and (hasRecipient some {ex:physician3})
>>>  ...)
>>>
>>> The above expression covers the class of *all* processing activities 
>>> of type Collect (no matter how data is concretely collected), on 
>>> some physical health data (it may involve blood pressure, heartbeat 
>>> frequency, etc), for the purpose of some kind of academic research 
>>> (be it medical, biological, ...), whose results are shared with 
>>> x:physician3.  Which is what a direct translation into English would 
>>> say.
>>>
>>> Manchester syntax is general enough to cover all OWL constructs; for 
>>> compliance checking a more streamlined JSON-like syntax may be 
>>> enough, e.g.:
>>>
>>> {
>>> hasDataSubject: {ex:patient1}
>>> hasPurpose: AcacemicResearch
>>> hasPersonalDataCategory: PhysicalHealth
>>> hasRecipient:
>>> ...
>>> }
>>>
>>> Such syntax only needs a well-specified mapping into OWL2 that gives 
>>> it a formal semantics and a logical meaning.
>>>
>>> Now approaches 2 and 3 differ in the representation of data requests.
>>>
>>> In APPROACH 2, data requests are still expressed as RDFS nodes (as 
>>> in APPROACH 1). Then compliance checking can be reduced to instance 
>>> checking (i.e. whether the data request is an instance of consent).
>>>
>>> In APPROACH 3, data requests are expressed as classes, with the same 
>>> syntax as consent. In this case, compliance checking can be reduced 
>>> to subsumption (i.e. checking whether the data request class is 
>>> contained in the consent class).
>>>
>>>
>>> APPROACH 4
>>>
>>> A class may also be expressed as a SPARQL query (the answer is the 
>>> class). Data requests are as in approaches 1 and 2.
>>>
>>> The above consent could be expressed as a SPARQL query selecting all 
>>> objects with  hasDataSubject=ex:patient1, hasPurpose in 
>>> AcacemicResearch, etc.
>>>
>>> ex:dataRequest is compliant iff it belongs to the query answer.
>>>
>>> My personal feeling is that expressing consent via a SPARQL query 
>>> introduces lots of irrelevant stuff and is too operational.
>>>
>>>
>>>
>>
>

-- 
---
Harshvardhan Pandit
ADAPT Centre
Trinity College Dublin

Received on Wednesday, 27 May 2020 15:26:40 UTC