Re: Mixing classes and instances from Harshvardhan J. Pandit on 2022-01-18 (public-dpvcg@w3.org from January 2022)

From: Harshvardhan J. Pandit <me@harshp.com>
Date: Tue, 18 Jan 2022 15:59:44 +0000
To: "Hoekstra, Rinke (ELS-AMS)" <r.hoekstra@elsevier.com>, "public-dpvcg@w3.org" <public-dpvcg@w3.org>
Message-ID: <97b76832-ea11-cbe7-d17e-8540d42b20a2@harshp.com>
Hi Rinke, All.
Thanks for bringing this up. Its timely and important to drive this 
discussion to a conclusion given that we're moving towards having 
examples in documentation. My replies are inline.

tldr; I agree that SKOS is a much better and consistent model for 
expressing what DPV wants to provide. For OWL or RDFS, a separate file 
can provide alternate serialisations.

The answer to why things are like this is 'technical debt'. We started 
with ontologies developed as part of SPECIAL project, which were in OWL2 
and used classes to define policies. Moving forward, we're trying to 
broaden the applicability, which is IMHO what SKOS is best suited to do.

So to conclude, there's a proposal on the table to move to SKOS, I 
support/lead it, and we will also provide RDFS and OWL separately as 
alternatives to keep existing adopters/users happy.

Please share your thoughts, support, alternatives here on the mailing 
list or on the GitHub issue (https://github.com/w3c/dpv/issues/8).

On 18/01/2022 15:12, Hoekstra, Rinke (ELS-AMS) wrote:

> One of the things that puzzles me about the vocabulary is the choice for 
> using RDFS classes and RDF properties for representing the vocabulary, 
> and in particular on the different categories of personal data.
> 
> First of all, why choose rdfs:Class and rdf:Property vs owl:Class and 
> owl:ObjectProperty/owl:DatatypeProperty. The latter give you more 
> finegrained control over what the intended/expected range for the 
> properties are. There is one good reason not to differentiate, and that 
> is that you don’t want to impose a specific way of modeling the data. 
> This comes at a cost of reusability.

Because we don't forsee such strict limitations on what the domain/range 
of those properties should be. They're free-form because what may be an 
object in someone's use-case could be a datatype/literal in someone 
else's. See example below.

> 
> Secondly, I do not understand the choice to model all of the categories 
> as classes. What are the intended instances of these classes?

This is tricky to answer and to explain. In 'real-world', there may 
never be instances. For example, a policy operating only on 'data 
categories' would have only 'classes'. Sure we could argue such 
categories should be represented as instances, but in OWL instances are 
kind of final in that you cannot further expand them within the same 
taxonomy (i.e. subclasses).

So we want a way to do all three of the following:

ex:A dpv:hasPersonalData dpv:EmailAddress .
ex:A dpv:hasPersonalData ex:MyEmailAddress .
ex:A dpv:hasPersonalData "myemail@example.com" .

So the range of this property becomes classes AND instances, which is 
weird under OWL unless you do convoluted expressions stating a union of 
subclasses and instances, which even then won't be complete.

The third example having a literal is the problematic one. Blank nodes 
will be inevitably created if trying to do a mapping or alignment 
between e.g. from database to RDF when range of property is an instance. 
So we can "suggest" never to use literals and to pack literals into 
arbitrary instances - which would make many people unhappy because 
that's how they specify their data.

Using SKOS, it gets a little easier, because the range is now an 
instance of one concept, and even if it still can't specify literals, it 
can arbitrarily specify what would have been classes and instances in 
OWL. Example:

ex:A rdfs:subClassOf ex:B .
ex:A skos:broader ex:B .

ex:M a ex:N .
ex:M skos:broader ex:N .

> 
> I can see a discussion related to this topic took place at the Nov 2020 
> meeting [2], but the outcome seemed to be more around removing 
> domain/range restrictions so that the solution around the issue above, 
> as proposed by Victor (:wave:) in e.g. [3] gets hidden under the carpet 
> (Victor suggested that the range of e.g. dpv:hasProcessing is a blank 
> node that is an instance of dpv:Collect). Yes, that’s ugly [4], and I 
> agree with Rob’s suggestion here to use SKOS or instances and enumerated 
> classes. I think Harsh also supports this in his emails [5].
> 
> The arguments against this appear to be around inferencing, but I don’t 
> see what inferencing task is served by modeling these categories as classes.

That was a band-aid solution so that the vocabulary can be used while we 
'discuss' a better way to go ahead (re. SKOS). I do support SKOS for 
precisely these reasons.

Though even using SKOS is not straightforward, so there has to be some 
discussion on the exact mechanics of what concepts to use from SKOS.

See https://github.com/w3c/dpv/issues/8 for using SKOS.

See https://harshp.com/dpv-x/primer/#classes-hierarchies-and-instances 
for text about semantics and extensibility DPV must provide.

Regards,
-- 
---
Harshvardhan J. Pandit, Ph.D
Research Fellow
ADAPT Centre, Trinity College Dublin
https://harshp.com/
Received on Tuesday, 18 January 2022 16:00:00 UTC