Feedback on DPV Primer's draft from Piero Bonatti on 2022-01-31 (public-dpvcg@w3.org from January 2022)

From: Piero Bonatti <pieroandrea.bonatti@unina.it>
Date: Mon, 31 Jan 2022 11:41:13 +0100
To: public-dpvcg@w3.org
Message-ID: <5e84c3e1-c3e7-4f16-3f7f-95646c3dfe25@unina.it>
Dear Harsh,

thank you for your prompt answer.  I am realistic, so I do not expect 
any change of approach at this stage. Still I have to defend my 
evaluation of the SKOS approach.

- When you say that the reason for considering SKOS is related to the 
use of JSON-LD, you are simply confirming what I said, that is, that all 
these complications are related to using RDF(S) and circumventing its 
expressiveness limitations.  In fact, JSON-LD is actually an alternative 
semantics for RDFS.  One can say that they are essentially the same 
language.

- The use of RDFS/JSON-LD is motivated with the annotation use case. In 
this context, I said that - as an alternative to SKOS -  one might have 
used a meta-version of DPV properties in order to link a resource to a 
DPV class in RDFS/JSON-LD.
   For example, using the duplication approach, hasPersonalData would 
have a corresponding meta-property hasPersonalDataClass that ranges over 
a (meta)class "PersonalDataClasses", that in turn contains (as 
instances) PersonalData and its subclasses. You can define such 
metaclasses with RDFS and JSON-LD.  Now, for example, given a resource 
R, you can annotate it with - say:
               "R hasPersonaDataClass Location"
to assert that R contains location data.

The range of such meta-level properties are meta-classes (like 
PersonalDataClasses) whose instances are DPV concepts.  This resolves 
contradiction and ambiguities by keeping object-level and meta-level 
cleanly separate. I don't seem that this is the same as using SKOS, as 
in SKOS the two layers (object-level and meta-level) are not cleanly 
separated, nor precisely related.

- The "vague" semantics of SKOS affects interoperability not only with 
respect to the use cases that involve reasoning, but with respect to all 
use cases.  What is vague remains vague, no matter how it is used.  The 
example of reasoners treating a same SKOS graph in different ways can be 
easily generalized to other applications.

- I have to insist that OWL2-PL's knowledge bases/graphs are indeed 
simpler than SKOS graphs because the former contain statements like:
 * term A is a subclass of term B
 * term A is an instance of term B
 * the domain/range of property P is term A
 * classes A and B have no instances in common.
Moreover, a user who wants to add a new personal data category or a new 
purpose needs almost only the subclass statement (if TRAPEZE's 
guidelines are followed).  Understanding and extending SKOS assertions 
needs more work.

- Profiles are well-defined and checked syntactically, so your statement 
"the moment we say follow OWL2 - then the adopter is free to use any and 
all of OWL2 semantics" is ungrounded.  Only the assertions supported by 
the profile will be accepted, the others shall be treated like syntax 
errors. [please note that this is a fully standard approach: the OWL 
APIs themselves support profile definition and parsing].

- In the light of the above point, TRAPEZE's guidelines are actually and 
effectively going to remove all complications.

I am so confident about the greater usability of TRAPEZE's framework 
that I'm going to run user studies to prove it scientifically.  So I am 
looking forward to a stable SKOS proposal in order to have something to 
compare TRAPEZE's approach with.

I *strongly* agree with you about the importance of adoption. We only 
disagree (partially) on what may foster or hinder adoption.  While we 
both agree that some kind of rdf links may help, we disagree on how this 
can be optimally implemented.

Best regards

Piero


On 30/01/22 21:01, Harshvardhan J. Pandit wrote:
> Hi Piero, All.
> I understand and agree with your (Piero's) points. However there are a 
> few things to clarify wrt what DPV should mean. My replies are inline. I 
> may have missed/mis-interpreted some things. If so, please let me know.
> 
> tldr; I think your argument (which I agree with) is based on an adopter 
> using DPV only as an OWL2 vocabulary. I think this presents a huge 
> barrier for having someone 'understand' and 'use' DPV as most real-world 
> uses are not semantic web aware. It also puts a lot of burden when it 
> comes to provide good documentation and examples, which is completely 
> lacking at the moment because most active participants are not 
> semantic-web people. Hence I'm trying to 'simplify' understanding of DPV 
> concepts and how to use them by proposing use of SKOS instead of OWL as 
> the default iteration. This is based on several meetings/calls with 
> people who were interested in DPV but had trouble 'following RDF & OWL2' 
> usage.
> 
> On 30/01/2022 17:47, Piero Bonatti wrote:
> 
>> On 28/01/22 10:37, Harshvardhan J. Pandit wrote:
>>> I agree with your argument regarding the class/instance modelling. 
>>> This is also one of the reasons why I suggested SKOS would be a 
>>> better fit for most use-cases, since it makes it possible to always 
>>> have further expansions. With OWL, there is no possibility to define 
>>> a purpose and to later expand it like you describe. Which puts the 
>>> onus on the modeller to be sure about their concepts or risk changing 
>>> models with time.
>>
>>
>> Such onus can be removed. Actually, in TRAPEZE, the guidelines for 
>> extending DPV (if necessary) are very simple: always add a new class, 
>> unless the new term being added represents a specific organization (eg 
>> one controller, one recipient), a specific location (i.e. a GPS 
>> coordinate), or other single data values (eg one specific email 
>> address, or one specific ID - out of the many a person may have).  In 
>> case of doubt, make the new term a class.
> 
>>
>> This rule of thumb is going to avoid most semantic issues and make the 
>> modeller's life very easy; moreover, it yields cleaner and more 
>> uniform extensions of the base vocabularies, therefore we believe that 
>> it is also going to improve interoperability.
>>
>> Thus, in my opinion, it would be advisable to give the same 
>> suggestions in the primer.
> 
> I agree with this suggestion for when DPV is used as an OWL2 vocabulary.
> 
> However, there is still the issue of property domain/range assertions. 
> Even with punning, we get weird semantics, such as:
> 
> dpv:hasPersonalData rdfs:range dpv:PersonalData .
> :PDH hasPersonalData dpv:Location .  # class -> instance
> :MyGDS a dpv:Location .
> :PDH hasPersonalData :MyGPS .  # instance
> 
> While this is perfectly fine with punning, it makes it necessary for 
> someone using DPV to understand the mechanics of OWL2 - which is a big 
> ask IMO!
> 
>>
>> Second, we should not forget that we are now considering SKOS and its 
>> complications not because it is "ontologically" important to mix 
>> classes and instances.  The natural semantics of DPV concepts is 
>> clearly that of a class. We are considering SKOS because some 
>> applications want to use RDF no matter what, and in this rather 
>> unexpressive language, property values can only be instances.  It is 
>> not about the meaning of terms, or knowledge representation, it is 
>> only about circumventing RDF's limitations.
> 
> Actually, (IMHO), we're considering SKOS because its the closest 'simple 
> model' that someone who doesn't want RDF can still use and get something 
> inherently intuitive when using e.g. JSON-LD. It we follow the SKOS 
> patterns, they are simpler to grasp and easier to implement compared to 
> the complexities possible with OWL2. Either that, or the solution would 
> have to be another language created solely to express the required 
> interpretation which you've done in TRAPEZE.
> 
> I am hoping using the SKOS model permits DPV to be used much like what 
> schema.org has done for semantics i.e. encourage usage without making it 
> necessary to first read about RDF (or OWL), but still keeping such usage 
> (roughly) compatible. That's actually my personal summary of DPV: to be 
> the schema.org for data protection / privacy information.
> 
> If DPV was to be (only) used as a policy language or within semantic 
> reasoners, then I agree that the OWL2 semantics would have been much 
> better to enforce a strict(-er) interpretation. However, DPV has more 
> applications beyond semantic web, for e.g. as a vocabulary that can be 
> used to annotate all sorts of things (text, policies, software); and as 
> a simple language for interoperable communications (e.g. consent 
> requests or ROPAs). And yes, this can still be done with OWL2, but this 
> creates a very steep adoption curve.
> 
>>
>> One could have solved this problem simply by duplicating DPV's 
>> properties, giving an "object level" version usable in OWL2 policies, 
>> and a corresponding "metalevel version"  usable in RDF policies, so as 
>> to avoid the paradoxes discussed time ago by giving different ranges 
>> to the two versions. With this approach, it would also be possible to 
>> define a clean and coherent formal semantics for all policies (OWL and 
>> RDF).
> 
> This is indeed the proposal, i.e. to have the SKOS and OWL be under 
> separate namespaces so one has to explicitly choose the OWL2 semantics 
> in their data.
> 
>>
>> SKOS avoids such duplication, but the price to be payed is that the 
>> semantic issues related to the confusion between classes and instances 
>> are still under the carpet.  Syntactically, instances can be refined 
>> using "narrower", "broader", and related match relations, but this is 
>> possible only because these relations have no formal meaning (they can 
>> be any relations).  The downside is that it is not clear what policies 
>> mean (a reliability and interoperability issue), and it is impossible 
>> to prove that the compliance checking algorithms return no false 
>> positives or negatives.
> 
> Yes, this is again by intention (mine). Not all possible uses of DPV may 
> need such strict 'policy' like interpretations. What DPV gains when 
> using SKOS is simplicity and interoperability (e.g. between concepts 
> across two data graphs). What it loses is semantics (class vs instance) 
> and easy access to reasoning.
> 
>>
>> In particular, concerning interoperability, we should not forget that 
>> all compliant OWL2 reasoners must treat a given OWL2 ontology in the 
>> same way, while each application may treat a SKOS ontology more or 
>> less as it pleases (because "narrow" etc. do not have a semantics).
> 
> Yes, this is precisely why SKOS is a better option than OWL for most 
> use-cases, unless one *knows they want OWL2*. I'm optimising for 
> maximising adoption of DPV rather than semantic web reasoning here ;-)
> 
>>
>> One should also consider the additional burden in mastering SKOS (due 
>> to its additional meta-concepts, and its many different but partially 
>> related relations...).  Compared with SKOS, the OWL2 profile adopted 
>> by TRAPEZE (OWL2-PL) is much simpler, with only 2 kinds of relations 
>> (SubclassOf and instanceOf) and no boolean operators nor quantifiers.
> 
> I disagree that OWL2 or TRAPEZE's profile is 'simpler' than SKOS. Both 
> (OWL2 ones) have a lot of complexity hidden away behind the possibility 
> to use all sorts of complex OWL2 stuff. Even if we have 'guidelines', 
> the moment we say follow OWL2 - then the adopter is free to use any and 
> all of OWL2 semantics. This makes ensuring interoperability or even a 
> simple guideline to provide a very complicated and difficult task. This 
> means we'll need to write a 'formal specification' for what DPV (in OWL) 
> should or should not contain, and keep it updated as concepts are added. 
> That's a LOT of work, almost a H2020 project :-D
> 
> By contrast, the SKOS model's semantics are so simple and abstract, that 
> they minimise the possibilities for someone using DPV in some weird and 
> non-compatible way. There are only two relations narrow/broad to 
> express, and no meta-modelling to worry about since everything is an 
> instance (a skos:Concept). This makes it trivial for someone to take a 
> DPV hierarchy and use it however they want - whether just as a list of 
> concepts, or plug it into their vocabulary, or even map it to OWL2 
> interpretations.
> 
> In many of the calls I've had in the past two years based on someone 
> reaching out because they saw DPV and thought it was interesting, I've 
> had trouble getting them (usually its an industry person) understand the 
> semantics of DPV. They understand the basics (classes and subclasses) 
> but get really confused when we get to instances and OWL2 logic. Then 
> there were 'complaints' that the OWL2 interpretation prevented tooling 
> from properly using DPV because it blew up when presented with punning. 
> And finally there were discussions on how to use DPV "just like JSON" 
> i.e. they didn't care about semantic web, but wanted DPV basics.
> 
> So the goal here is to satisfy such requirements and to get DPV to be 
> actually used in more places. Its easier to 'sell' a complex semantics 
> and reasoner tooling that does cool stuff like check compliance if 
> someone is 'already using the vocabulary'. But its really difficult to 
> convince someone to use DPV if figuring out how to integrate it in their 
> stuff is a challenge.
> 
> All this being said, it is my wish that whatever DPV ends up being 
> should be backwards compatible with SPECIAL and TRAPEZE i.e. as OWL 
> vocabularies. Hence the parallel SKOS & OWL versions proposal. Hope this 
> makes it clearer on why I'm pushing for SKOS while advocating for OWL at 
> the same time.
> 
> Regards,
 From - Mon Jan 31 11:04:44 2022
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Subject: X-Priority: 3
From: "pieroandrea.bonatti@unina.it" <pieroandrea.bonatti@unina.it>
Date: Mon, 31 Jan 2022 09:34:10 +0100
To: "pab@un
Received on Monday, 31 January 2022 10:41:40 UTC