Re: Feedback on DPV Primer's draft

Forgot to send the below email to the mailing list address.


-------- Forwarded Message --------
Subject: Re: Feedback on DPV Primer's draft
Date: Sun, 30 Jan 2022 20:01:00 +0000
From: Harshvardhan J. Pandit <me@harshp.com>
To: Piero Bonatti <pieroandrea.bonatti@unina.it>

Hi Piero, All.
I understand and agree with your (Piero's) points. However there are a 
few things to clarify wrt what DPV should mean. My replies are inline. I 
may have missed/mis-interpreted some things. If so, please let me know.

tldr; I think your argument (which I agree with) is based on an adopter 
using DPV only as an OWL2 vocabulary. I think this presents a huge 
barrier for having someone 'understand' and 'use' DPV as most real-world 
uses are not semantic web aware. It also puts a lot of burden when it 
comes to provide good documentation and examples, which is completely 
lacking at the moment because most active participants are not 
semantic-web people. Hence I'm trying to 'simplify' understanding of DPV 
concepts and how to use them by proposing use of SKOS instead of OWL as 
the default iteration. This is based on several meetings/calls with 
people who were interested in DPV but had trouble 'following RDF & OWL2' 
usage.

On 30/01/2022 17:47, Piero Bonatti wrote:

> On 28/01/22 10:37, Harshvardhan J. Pandit wrote:
>> I agree with your argument regarding the class/instance modelling. 
>> This is also one of the reasons why I suggested SKOS would be a better 
>> fit for most use-cases, since it makes it possible to always have 
>> further expansions. With OWL, there is no possibility to define a 
>> purpose and to later expand it like you describe. Which puts the onus 
>> on the modeller to be sure about their concepts or risk changing 
>> models with time.
> 
> 
> Such onus can be removed. Actually, in TRAPEZE, the guidelines for 
> extending DPV (if necessary) are very simple: always add a new class, 
> unless the new term being added represents a specific organization (eg 
> one controller, one recipient), a specific location (i.e. a GPS 
> coordinate), or other single data values (eg one specific email address, 
> or one specific ID - out of the many a person may have).  In case of 
> doubt, make the new term a class.

> 
> This rule of thumb is going to avoid most semantic issues and make the 
> modeller's life very easy; moreover, it yields cleaner and more uniform 
> extensions of the base vocabularies, therefore we believe that it is 
> also going to improve interoperability.
> 
> Thus, in my opinion, it would be advisable to give the same suggestions 
> in the primer.

I agree with this suggestion for when DPV is used as an OWL2 vocabulary.

However, there is still the issue of property domain/range assertions. 
Even with punning, we get weird semantics, such as:

dpv:hasPersonalData rdfs:range dpv:PersonalData .
:PDH hasPersonalData dpv:Location .  # class -> instance
:MyGDS a dpv:Location .
:PDH hasPersonalData :MyGPS .  # instance

While this is perfectly fine with punning, it makes it necessary for 
someone using DPV to understand the mechanics of OWL2 - which is a big 
ask IMO!

> 
> Second, we should not forget that we are now considering SKOS and its 
> complications not because it is "ontologically" important to mix classes 
> and instances.  The natural semantics of DPV concepts is clearly that of 
> a class. We are considering SKOS because some applications want to use 
> RDF no matter what, and in this rather unexpressive language, property 
> values can only be instances.  It is not about the meaning of terms, or 
> knowledge representation, it is only about circumventing RDF's limitations.

Actually, (IMHO), we're considering SKOS because its the closest 'simple 
model' that someone who doesn't want RDF can still use and get something 
inherently intuitive when using e.g. JSON-LD. It we follow the SKOS 
patterns, they are simpler to grasp and easier to implement compared to 
the complexities possible with OWL2. Either that, or the solution would 
have to be another language created solely to express the required 
interpretation which you've done in TRAPEZE.

I am hoping using the SKOS model permits DPV to be used much like what 
schema.org has done for semantics i.e. encourage usage without making it 
necessary to first read about RDF (or OWL), but still keeping such usage 
(roughly) compatible. That's actually my personal summary of DPV: to be 
the schema.org for data protection / privacy information.

If DPV was to be (only) used as a policy language or within semantic 
reasoners, then I agree that the OWL2 semantics would have been much 
better to enforce a strict(-er) interpretation. However, DPV has more 
applications beyond semantic web, for e.g. as a vocabulary that can be 
used to annotate all sorts of things (text, policies, software); and as 
a simple language for interoperable communications (e.g. consent 
requests or ROPAs). And yes, this can still be done with OWL2, but this 
creates a very steep adoption curve.

> 
> One could have solved this problem simply by duplicating DPV's 
> properties, giving an "object level" version usable in OWL2 policies, 
> and a corresponding "metalevel version"  usable in RDF policies, so as 
> to avoid the paradoxes discussed time ago by giving different ranges to 
> the two versions. With this approach, it would also be possible to 
> define a clean and coherent formal semantics for all policies (OWL and 
> RDF).

This is indeed the proposal, i.e. to have the SKOS and OWL be under 
separate namespaces so one has to explicitly choose the OWL2 semantics 
in their data.

> 
> SKOS avoids such duplication, but the price to be payed is that the 
> semantic issues related to the confusion between classes and instances 
> are still under the carpet.  Syntactically, instances can be refined 
> using "narrower", "broader", and related match relations, but this is 
> possible only because these relations have no formal meaning (they can 
> be any relations).  The downside is that it is not clear what policies 
> mean (a reliability and interoperability issue), and it is impossible to 
> prove that the compliance checking algorithms return no false positives 
> or negatives.

Yes, this is again by intention (mine). Not all possible uses of DPV may 
need such strict 'policy' like interpretations. What DPV gains when 
using SKOS is simplicity and interoperability (e.g. between concepts 
across two data graphs). What it loses is semantics (class vs instance) 
and easy access to reasoning.

> 
> In particular, concerning interoperability, we should not forget that 
> all compliant OWL2 reasoners must treat a given OWL2 ontology in the 
> same way, while each application may treat a SKOS ontology more or less 
> as it pleases (because "narrow" etc. do not have a semantics).

Yes, this is precisely why SKOS is a better option than OWL for most 
use-cases, unless one *knows they want OWL2*. I'm optimising for 
maximising adoption of DPV rather than semantic web reasoning here ;-)

> 
> One should also consider the additional burden in mastering SKOS (due to 
> its additional meta-concepts, and its many different but partially 
> related relations...).  Compared with SKOS, the OWL2 profile adopted by 
> TRAPEZE (OWL2-PL) is much simpler, with only 2 kinds of relations 
> (SubclassOf and instanceOf) and no boolean operators nor quantifiers.

I disagree that OWL2 or TRAPEZE's profile is 'simpler' than SKOS. Both 
(OWL2 ones) have a lot of complexity hidden away behind the possibility 
to use all sorts of complex OWL2 stuff. Even if we have 'guidelines', 
the moment we say follow OWL2 - then the adopter is free to use any and 
all of OWL2 semantics. This makes ensuring interoperability or even a 
simple guideline to provide a very complicated and difficult task. This 
means we'll need to write a 'formal specification' for what DPV (in OWL) 
should or should not contain, and keep it updated as concepts are added. 
That's a LOT of work, almost a H2020 project :-D

By contrast, the SKOS model's semantics are so simple and abstract, that 
they minimise the possibilities for someone using DPV in some weird and 
non-compatible way. There are only two relations narrow/broad to 
express, and no meta-modelling to worry about since everything is an 
instance (a skos:Concept). This makes it trivial for someone to take a 
DPV hierarchy and use it however they want - whether just as a list of 
concepts, or plug it into their vocabulary, or even map it to OWL2 
interpretations.

In many of the calls I've had in the past two years based on someone 
reaching out because they saw DPV and thought it was interesting, I've 
had trouble getting them (usually its an industry person) understand the 
semantics of DPV. They understand the basics (classes and subclasses) 
but get really confused when we get to instances and OWL2 logic. Then 
there were 'complaints' that the OWL2 interpretation prevented tooling 
from properly using DPV because it blew up when presented with punning. 
And finally there were discussions on how to use DPV "just like JSON" 
i.e. they didn't care about semantic web, but wanted DPV basics.

So the goal here is to satisfy such requirements and to get DPV to be 
actually used in more places. Its easier to 'sell' a complex semantics 
and reasoner tooling that does cool stuff like check compliance if 
someone is 'already using the vocabulary'. But its really difficult to 
convince someone to use DPV if figuring out how to integrate it in their 
stuff is a challenge.

All this being said, it is my wish that whatever DPV ends up being 
should be backwards compatible with SPECIAL and TRAPEZE i.e. as OWL 
vocabularies. Hence the parallel SKOS & OWL versions proposal. Hope this 
makes it clearer on why I'm pushing for SKOS while advocating for OWL at 
the same time.

Regards,
-- 
---
Harshvardhan J. Pandit, Ph.D
Research Fellow
ADAPT Centre, Trinity College Dublin
https://harshp.com/

Received on Sunday, 30 January 2022 20:02:19 UTC