Re: Feedback on DPV Primer's draft from Harshvardhan J. Pandit on 2022-01-31 (public-dpvcg@w3.org from January 2022)

From: Harshvardhan J. Pandit <me@harshp.com>
Date: Mon, 31 Jan 2022 11:36:24 +0000
To: Piero Bonatti <pieroandrea.bonatti@unina.it>
Cc: "public-dpvcg@w3.org" <public-dpvcg@w3.org>
Message-ID: <273c7743-7afb-d11e-43f2-97dbf4663868@harshp.com>
Hi Piero. Thanks for your (also) prompt reply! My comments are inline.
btw; this email was not sent to the mailing list, but I've added it. I 
hope you don't mind.

tldr; We need to sort of agree on a compromise here, otherwise we both 
could keep discussing this for a long time. I, as the chair, have to 
make a decision this week or next. I, as the only person who's writing 
documentation, also needs to make the decision to write examples and 
use-cases. This also needs to be done this week or next. Neither of this 
are by my choice - no one else has volunteered. So I won't delay these 
any further since my aim is to get DPV to v1 this year.

I, personally, don't have the time to do the kind of OWL profile 
specification and validation/profile-checker that you are indicating. I 
know TRAPEZE is doing this. So unless you or TRAPEZE colleagues can do 
this right away, I don't see how it can be done in time.

On 31/01/2022 10:03, Piero Bonatti wrote:

> 
> - The use of RDFS/JSON-LD is motivated with the annotation use case. In 
> this context, I said that - as an alternative to SKOS -  one might have 
> used a meta-version of DPV properties in order to link a resource to a 
> DPV class in RDFS/JSON-LD.
>    For example, using the duplication approach, hasPersonalData would 
> have a corresponding meta-property hasPersonalDataClass that ranges over 
> a (meta)class "PersonalDataClasses", that in turn contains (as 
> instances) PersonalData and its subclasses. You can define such 
> metaclasses with RDFS and JSON-LD.  Now, for example, given a resource 
> R, you can annotate it with - say:
>                "R hasPersonaDataClass Location"
> to assert that R contains location data.
> 
> The range of such meta-level properties are meta-classes (like 
> PersonalDataClasses) whose instances are DPV concepts.  This resolves 
> contradiction and ambiguities by keeping object-level and meta-level 
> cleanly separate. I don't seem that this is the same as using SKOS, as 
> in SKOS the two layers (object-level and meta-level) are not cleanly 
> separated, nor precisely related.

I really think this is 'ugly' to have separate properties like that. I 
haven't seen this done anywhere else either. This also requires 
explaining what is a 'class' and an 'instance', and its nuances, and 
what happens if these are mixed - to the adopter.

Further, there is a really good chance that what one graph may think of 
as an instance, another one may want it to be a class. For example, 
Country as instance in one graph (controller location), and class in 
another (data storage locations for servers). We can create 
'guidelines', but I think we'll be asking a lot from users if they need 
to re-evaluate their entire data model every time a concept is 
added/changes.

> 
> - The "vague" semantics of SKOS affects interoperability not only with 
> respect to the use cases that involve reasoning, but with respect to all 
> use cases.  What is vague remains vague, no matter how it is used.  The 
> example of reasoners treating a same SKOS graph in different ways can be 
> easily generalized to other applications.

With OWL and SKOS, we get the same interoperability in case of DPV. Both 
are sufficient to express whether a concept is "part of" another concept 
(as a set). That's all that is needed at the moment IMHO.

This is considering (and I posit) that not all use-cases would even 
involve reasoning beyond this. For example, when I work with ROPA 
(register of processing activitiy) or Consent requests, I need a 
vocabulary to represent the terms, and they may not end up being 
semantic-web in the end. The only inference/reasoning involved is as a 
hierarchy.

The spectrum for DPV's possible application ranges from annotate/tag to 
doing logic-based compliance checking (e.g. DAPRECO KG). It is 
impossible to support all of it. Instead, I'm trying to make it as 
simple as possible to reuse/convert DPV to support any of these.

> 
> - I have to insist that OWL2-PL's knowledge bases/graphs are indeed 
> simpler than SKOS graphs because the former contain statements like:
>      * term A is a subclass of term B
>      * term A is an instance of term B
>      * the domain/range of property P is term A
>      * classes A and B have no instances in common.
> Moreover, a user who wants to add a new personal data category or a new 
> purpose needs almost only the subclass statement (if TRAPEZE's 
> guidelines are followed).  Understanding and extending SKOS assertions 
> needs more work.

I have to disagree with this. SKOS assertions are simpler than OWL or 
TRAPEZE ones. With SKOS, we have:

term T(op level concept) is an instance of skos:ConceptScheme and 
owl:Class .
term A is an instance of term T
term A is broader/narrower than some other concept (if needed)
domain/range of property P is term T

This allows for things like:
:MyEmail a dpv:PersonalData .
:MyEmailUsagePatterns a dpv:PersonalData ; skos:broader :MyEmail .
:X dpv:hasPersonalData :MyEmail, :MyEmailUsagePatterns .

If someone wants to extend this, they just need the broader/narrower 
relations.

:MyEmailUsagePatternsOnSaturday a dpv:PersonalData ;
     skos:broader :MyEmailUsagePatterns.

In OWL, one has to do it like this:
:MyEmail rdfs:subClassOf dpv:PersonalData ;
     a dpv:PersonalData .
:MyEmailUsagePatterns a :MyEmail .
:X dpv:hasPersonalData :MyEmail, :MyEmailUsagePatterns .

Then to extend it, one needs to assert MyEmailUsagePatterns as a class now.
:MyEmailUsagePatterns rdfs:subClassOf :MyEmail .
:MyEmailUsagePatternsOnSaturday a :MyEmailUsagePatterns .

So the OWL one constantly asks someone to evaluate between 
classes/instances whereas the SKOS one doesn't. One could say use only 
sub-classes and no instances, but then why pretend this is different 
from the SKOS model? They are both doing the same, expressing 
parent-child style relationships.

Additionally with OWL, you have to explain when to use subclasses or why 
not to use instances. Because one could 'not follow' the guidelines. 
Whereas with SKOS, all you need to say is: find a concept in the 
hierarchy closest to the one you have, extend it using skos:broader. 
That's it. Nothing else to think about. And there's no other way to do 
this which makes not following "guidelines" more difficult.

And with SKOS, we get better 'vocabulary' management because I can 
create ad-hoc taxonomies that work with both SKOS and OWL, and that maps 
really simply with anything else that requries a taxonomy. For example, 
creating 'Data Transfer Legal Basis' as a separate taxonomy is as 
trivial as declaring a concept scheme and throwing all related legal 
bases under it. With OWL, one has to craft sub-classes for new concepts, 
and re-arrange the entire legal bases taxonomy when one concept is 
added. This means DPV will fluctuate a lot between versions. Not 
desirable IMO.

> 
> - Profiles are well-defined and checked syntactically, so your statement 
> "the moment we say follow OWL2 - then the adopter is free to use any and 
> all of OWL2 semantics" is ungrounded.  Only the assertions supported by 
> the profile will be accepted, the others shall be treated like syntax 
> errors. [please note that this is a fully standard approach: the OWL 
> APIs themselves support profile definition and parsing].

Okay. First, this necessitates creating a separate profile. Who will 
create these profile specifications and linters/checkers for their 
expression? Then these need to be kept up to date as things change. 
There is documentation to be created, use-cases to be written. It is a 
lot of work!

Right now in DPVCG we don't even have a lot of involvement from people 
for discussing and refining the concepts. Its just me working on a lot 
of stuff, and this is after-hours work, not even as part of my regular 
job. So I consider working on more specifications and profile checkers 
as nice to have, but not a priority if there are no person months 
available to get them done.

> 
> - In the light of the above point, TRAPEZE's guidelines are actually and 
> effectively going to remove all complications.

Maybe for TRAPEZE use-cases. But I don't think they work with other 
use-cases where the same requirements are not present. As in SPECIAL, 
TRAPEZE has a strict set of aims for what it wants to do with DPV. But 
there are others who also want to use DPV, and I re-iterate that this 
does not always involve the kind of profile checking you do within 
SPECIAL/TRAPEZE.

> 
> I am so confident about the greater usability of TRAPEZE's framework 
> that I'm going to run user studies to prove it scientifically.  So I am 
> looking forward to a stable SKOS proposal in order to have something to 
> compare TRAPEZE's approach with.

I don't disagree with you about the usefulness of work in TRAPEZE. I'm 
merely re-iterating that this is not the only possible use for DPV. And 
that basing DPV's design only on how beneficial it is in profile 
expression or checking (a la TRAPEZE) affects other uses.

Here are two of my publications (there are other works using DPV, but 
these reflect personal experiences):

1) "ODRL Profile for Expressing Consent through Granular Access Control 
Policies in Solid" 
https://harshp.com/research/publications/048-odrl-profile-consent-solid-acp

2) "A Common Semantic Model of the GDPR Register of Processing 
Activities" 
https://harshp.com/research/publications/037-common-semantic-model-GDPR-ROPA

Both don't require 'profiles' or 'OWL2' the way TRAPEZE does. #1 needs 
compatibility with ODRL, and #2 needs assertions about property 
domain/ranges. But the current design of DPV as being this abstract 
amalgamation of concepts and properties with no actual usage guidelines 
affects both of these and introduces a *lot* of considerations that are 
not related to the work, but instead towards figuring out what it means 
to have DPV semantics used in these applications. This is frustrating.

If we pretend DPV was not a semantic-web vocabulary, but a list of 
concepts (a taxonomy), then the current OWL2 design would not be the 
right choice for either of these.

> 
> I *strongly* agree with you about the importance of adoption. We only 
> disagree (partially) on what may foster or hinder adoption.  While we 
> both agree that some kind of rdf links may help, we disagree on how this 
> can be optimally implemented.

Yes. I think it all comes down to how to get most of this done with the 
limited time people have. A lot of what you are suggesting requires 
people who are experts in i) OWL ii) reasoning iii) writing 
specifications and profile checkers. We don't have those. So unless you 
or colleagues are willing to expend time in doing these, I don't see how 
these can be done.

I would really like to have a 'formal' and 'fully formed' specification 
like you mean in terms of OWL profiles. But the area of its application 
is so vast, and the amount of time people are willing to spend on this 
so less, that I don't think its feasible to have it done with all the 
documentation and examples and real-world concepts that we plan to have 
in the next couple of months.

Regards,
-- 
---
Harshvardhan J. Pandit, Ph.D
Research Fellow
ADAPT Centre, Trinity College Dublin
https://harshp.com/
Received on Monday, 31 January 2022 11:36:42 UTC