Mixing classes and instances

Hi All,

Coming a bit late to the party, but I’ve been reading the DPV documentation with interest to see whether we could adopt it for capturing policy information for our datasets.

One of the things that puzzles me about the vocabulary is the choice for using RDFS classes and RDF properties for representing the vocabulary, and in particular on the different categories of personal data.

First of all, why choose rdfs:Class and rdf:Property vs owl:Class and owl:ObjectProperty/owl:DatatypeProperty. The latter give you more finegrained control over what the intended/expected range for the properties are. There is one good reason not to differentiate, and that is that you don’t want to impose a specific way of modeling the data. This comes at a cost of reusability.

Secondly, I do not understand the choice to model all of the categories as classes. What are the intended instances of these classes?

I can see a discussion related to this topic took place at the Nov 2020 meeting [2], but the outcome seemed to be more around removing domain/range restrictions so that the solution around the issue above, as proposed by Victor (:wave:) in e.g. [3] gets hidden under the carpet (Victor suggested that the range of e.g. dpv:hasProcessing is a blank node that is an instance of dpv:Collect). Yes, that’s ugly [4], and I agree with Rob’s suggestion here to use SKOS or instances and enumerated classes. I think Harsh also supports this in his emails [5].

The arguments against this appear to be around inferencing, but I don’t see what inferencing task is served by modeling these categories as classes.

For instance, if I look at the Primer [1] (don’t know how up-to-date this is), there is an example about AcmeMarketing:

ex:AcmeMarketing a dpv:PersonalDataHandling ;
            dpv:hasPersonalDataCategory dpv:EmailAddress ;
            dpv:hasProcessing dpv:Collect, dpv:Use ;
            dpv:hasPurpose dpv:Marketing ;
            dpv:hasDataController ex:Acme .

This is an individual ex:AcmeMarketing of type dpv:PersonalDataHandling, which has a dpv:hasPersonalDataCategory relation to the class dpv:EmailAddress. I interpret this as to have the intended meaning that ex:AcmeMarketing has as data category the set of all email addresses (both known and unknown). IMO it doesn’t really reflect best practices in that it mixes class-level and instance-level statements, and won’t “work” in OWL2 DL inference engines.

Of course, it could be that this example is not reflecting the latest thinking, but it does show what the design choice of using classes forces one to do: how else would one represent this when dpv:EmailAddress, dpv:Collect, dpv:Use and dpv:Marketing are classes? If these were skos:Concepts or any other kind of individual, the example *would* be correct. If they are classes… what are their instances?

Also, this design choice makes it non-intuitive (at least for me) to relate ex:AcmeMarketing to individual datasets. How does this work? How do I know whether my dataset, which contains email addresses, is suitable for use by ex:AcmeMarketing?

The documentation suggests that I should directly relate the individual that represents my dataset to dpv:EmailAddress using the dpv:hasPersonalDataCategory property. Again, I’ll be mixing individuals and classes, and again, if dpv:EmailAddress were a skos:Concept or individual, it would all be “correct” from a DL perspective.

Could someone enlighten me?

Thanks!

-Rinke


[1] https://docs.google.com/document/d/1FrPFTRUAreEipvM5hTPPqfnhOl0rBjXpXCMicw9me30/edit#
[2] https://www.w3.org/community/dpvcg/wiki/Workshop20201104
[3] https://lists.w3.org/Archives/Public/public-dpvcg/2020Apr/0005.html
[4] https://lists.w3.org/Archives/Public/public-dpvcg/2020Apr/0010.html
[5] https://lists.w3.org/Archives/Public/public-dpvcg/2020Sep/0009.html and https://lists.w3.org/Archives/Public/public-dpvcg/2020Sep/0001.html

--
Rinke Hoekstra
Lead Architect – Knowledge
ELSEVIER - Amsterdam
r.hoekstra@elsevier.com<mailto:r.hoekstra@elsevier.com>

Emails can arrive at all hours, but at Elsevier we respect your personal time. Feel free to respond to this email during your normal working hours.


________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33158992, Registered in The Netherlands.

Received on Tuesday, 18 January 2022 15:12:33 UTC