- From: Harshvardhan J. Pandit <me@harshp.com>
- Date: Wed, 27 May 2020 16:26:23 +0100
- To: Piero Bonatti <pieroandrea.bonatti@unina.it>, public-dpvcg@w3.org
Thanks Piero - I agree that we need guidelines and indications. And my next suggestion was going to be iterating through the different approaches to demonstrate possible representations and where they might excel/suffer. E.g. OWL2 work you suggested for automatic compliance checking (as you have in SPECIAL) At the same time, we need a single consistent style for smaller examples within the Primer and other documents. Best, Harsh On 27/05/2020 16:07, Piero Bonatti wrote: > Hi Harsh, I see your point, and I saw it coming :-) > > since the goal of the encoding is relevant in choosing the right > formalization - as you explain in your message - wouldn't it make > sense to provide a small set of alternative encoding styles, for the > different purposes mentioned in your message? so that people do not > have to backtrack after discovering that the encoding they adopted is > not appropriate for their purpose. A set of guidelines should gently > lead the user towards the correct formalization style. > > Best, > Piero > > On 27/05/20 10:02, Harshvardhan J. Pandit wrote: >> Dear all, >> IMHO the issue of how best to represent DPV examples is highly >> dependant on its eventual use. >> For example, Piero's email argues the use of OWL classes as best >> suited for automated compliance checking using the subsumption >> mechanism (as in SPECIAL). >> >> However, to provide a diverse perspective on this - I am raising the >> possibility that DPV may not always be used for automated compliance >> checking or with an OWL2 reasoner. An adopter could use it in a >> purely declarative manner - for example to document or log or specify >> data processing. In such cases, OWL(2) syntaxes are less >> understandable in comparison to RDFS - by virtue of tooling and >> mapping to other paradigms (E.g. object-oriented or graphs). >> >> Further, we started the inception of DPV with the goal of providing a >> vocabulary - which is why we have RDFS classes instead of OWL2. And I >> think we should not deviate from that - or risk alienating potential >> adopters with too much semantic complexity. >> >> That being said, I agree with the previous emails that blank nodes >> are *NOT* suitable - but if someone wants to use them, it won't be >> invalid and they can do whatever they want in their use-case. For >> example, >> >> _ dpv:hasProcessing [ a dpv:Collect ] . >> >> SELECT ?processing WHERE { >> ?x dpv:hasProcessing/rdf:type ?processing . >> } >> >> Convoluted - for sure, but not invalid. >> >> Intuitively, I see people not familiar with semantic web modelling >> their data as: >> >> _ dpv:hasProcessing dpv:Collect . >> >> which is what they mean e.g. in consent requests. This is valid in >> OWL2, as Victor pointed out, and again a bit frowned upon, but still >> not invalid (in RDFS and OWL2). >> This is also the same issue when dealing with personal data >> categories since all of them are classes rather than instances. >> >> Personally, I like the OWL2 subsumption as the 'cleanest' indication >> of what we mean - but am wary of using OWL2 due to expressitivty, >> complexity, and more importantly - modelling challenges for adopters. >> If they are invested in and are using OWL2 reasoners/tooling - super! >> They can model everything in OWL2. >> But if someone is only using DPV for declarative documentation of >> data then OWL2 is probably not what they use. >> Hence would recommend sticking with RDFS as much as possible - if we >> can agree on how to do it. >> >> As for Piero's note on OWL2 notation and syntax - I agree it is very >> convenient, and something we can look at in parallel. >> >> Regards, >> Harsh >> >> P.S. How is ODRL dealing with this issue? >> Did they declare all concepts as instances? e.g. use, collect are >> instances of odrl:Action and be done with it? >> Would doing that have solved our problem? >> E.g. collect, store, use are instances of dpv:Processing but not >> subclasses; and then collect, store, use can have subclass >> relationships between themselves. >> >> On 27/04/2020 16:38, Piero Bonatti wrote: >>> Dear all, >>> >>> during the last call I have been asked to resume the discussion on >>> the best way of encoding consent and data requests. >>> >>> Below you can find a list of 4 possible approaches, with some pros >>> and cons, discussed with the goal of automated compliance checking >>> in mind. >>> >>> Please comment on the alternatives, share your preferences, and >>> point out possible drawbacks (including non-technical aspects, that >>> I deliberately leave out of the list). >>> And, of course, feel free to suggest your own approach. >>> >>> For your convenience, I have also included the examples from the >>> previous messages. >>> >>> Best regards >>> Piero >>> >>> PS: my personal preference so far is for approach 3 below, that in >>> my opinion is the most uniform and clean of all four. >>> >>> -------------------------------- >>> >>> APPROACH 1 >>> >>> This is the approach circulated in previous messages. The specific >>> example consists of a consent and a data request, encoded in RDFS as >>> follows: >>> >>> ex:consentPatient1 a dpv:Consent ; >>> dpv:hasDataSubject ex:patient1 ; >>> dpv:hasPurpose [a dpv:AcademicResearch]; >>> dpv:hasProcessing [a dpv:Collect]; >>> dcterms:title "Consent for Health data analysis in a clinical >>> study ..." ; >>> dpv:hasDataController ex:hospital1; >>> dpv:haRecipient ex:physiotherapist1; >>> dpv:hasPersonalDataCategory [a dpv:PhysicalHealth]. >>> >>> >>> ex:dataRequest a dpv:PersonalDataHandling ;tell us >>> dpv:hasDataSubject ex:patient1 ; >>> dpv:hasPurpose [a dpv:AcacemicResearch] ; >>> dpv:hasProcessing [a dpv:Collect]; >>> dpv:hasLegalBasis [a dpv:Consent]; >>> dpv:hasDataController ex:hospital1; >>> dpv:haRecipient ex:physician3; >>> dpv:hasPersonalDataCategory [a dpv:PhysicalHealth]; >>> dcterms:title "Personal Data Collection for clinical study >>> ..." >>> >>> The main drawback of this approach is that ex:consentPatient1 says >>> (in English) that ex:patient1 consents to some processing, for some >>> purpose, over some data category, that are all unspecified, because >>> they are expressed with blank nodes. >>> Consequently, consent and data request are logically unrelated, >>> because the blank nodes in the consent and those in the data request >>> may denote different individuals. >>> >>> Thus compliance checking cannot be reduced to any form of logical >>> reasoning between the two graphs. In order to check compliance, one >>> needs an ad-hoc notion of matching (that must be justified for >>> correctness and completeness from scratch). It is not clear whether >>> the ad-hoc matching algorithm can be implemented on top of the >>> standard reasoning tools. >>> >>> The above problem can be solved by making consent a *class* of >>> objects; then compliance can be reduced to checking whether the data >>> request is contained in the consent - which can be reduced to >>> standard reasoning tasks, see below. >>> >>> APPROACHES 2, 3 >>> >>> In these two approaches, consent is an OWL2 class. Among the >>> standard alternative syntax of OWL2, Manchester syntax is probably >>> the simplest so far. In Manchester syntax a consent class would look >>> like this: >>> >>> (hasDataSubject some {ex:patient1} >>> and (hasPurpose some AcacemicResearch) >>> and (hasPersonalDataCategory some PhysicalHealth) >>> and (hasProcessing some Collect) >>> and (hasRecipient some {ex:physician3}) >>> ...) >>> >>> The above expression covers the class of *all* processing activities >>> of type Collect (no matter how data is concretely collected), on >>> some physical health data (it may involve blood pressure, heartbeat >>> frequency, etc), for the purpose of some kind of academic research >>> (be it medical, biological, ...), whose results are shared with >>> x:physician3. Which is what a direct translation into English would >>> say. >>> >>> Manchester syntax is general enough to cover all OWL constructs; for >>> compliance checking a more streamlined JSON-like syntax may be >>> enough, e.g.: >>> >>> { >>> hasDataSubject: {ex:patient1} >>> hasPurpose: AcacemicResearch >>> hasPersonalDataCategory: PhysicalHealth >>> hasRecipient: >>> ... >>> } >>> >>> Such syntax only needs a well-specified mapping into OWL2 that gives >>> it a formal semantics and a logical meaning. >>> >>> Now approaches 2 and 3 differ in the representation of data requests. >>> >>> In APPROACH 2, data requests are still expressed as RDFS nodes (as >>> in APPROACH 1). Then compliance checking can be reduced to instance >>> checking (i.e. whether the data request is an instance of consent). >>> >>> In APPROACH 3, data requests are expressed as classes, with the same >>> syntax as consent. In this case, compliance checking can be reduced >>> to subsumption (i.e. checking whether the data request class is >>> contained in the consent class). >>> >>> >>> APPROACH 4 >>> >>> A class may also be expressed as a SPARQL query (the answer is the >>> class). Data requests are as in approaches 1 and 2. >>> >>> The above consent could be expressed as a SPARQL query selecting all >>> objects with hasDataSubject=ex:patient1, hasPurpose in >>> AcacemicResearch, etc. >>> >>> ex:dataRequest is compliant iff it belongs to the query answer. >>> >>> My personal feeling is that expressing consent via a SPARQL query >>> introduces lots of irrelevant stuff and is too operational. >>> >>> >>> >> > -- --- Harshvardhan Pandit ADAPT Centre Trinity College Dublin
Received on Wednesday, 27 May 2020 15:26:40 UTC