Re: DPCat: Data Processing Catalogue using DPV and DCAT(-AP) from Bert Van Nuffelen on 2022-04-19 (public-dpvcg@w3.org from April 2022)

From: Bert Van Nuffelen <Bert.Van.Nuffelen@tenforce.com>
Date: Tue, 19 Apr 2022 21:02:44 +0000
To: "Harshvardhan J. Pandit" <harshvardhan.pandit@adaptcentre.ie>
CC: "public-dpvcg@w3.org" <public-dpvcg@w3.org>, Rob Brennan <rob.brennan@dcu.ie>
Message-ID: <VI1P191MB0783621C2938EFCCC6255499D6F29@VI1P191MB0783.EURP191.PROD.OUTLOOK.COM>
Hi Harsh,

Actually your explanation is the reason why I ask these questions.
It is not because  a profile is build on top of DCAT that one can/should mix  two DCAT catalogues based on different profiles.

Hi. I don't understand why your analysis states DPCat ROPARecord should
not be a DCAT-AP dataset - there is no reason why we cannot extend DCAT
or DCAT-AP to represent catalogues of a specific kind - in this case
related to ROPA.
Extending DCAT is different than DCAT-AP. A DCAT-AP dataset is a collection of data to be shared with the public through data portals, with the objective to build smart cities, a smart government, etc. Maybe it is not so clearly expressed, but that is the underlying premisse.

For that reason I would not consider a single photo, a single artwork, a unique person a DCAT-AP dataset.
That is a level of granularity that is not the objective of DCAT-AP, despite you could model it as a DCAT(-AP) dataset.
The collection of photos of the Big Ben throughout the years, however, I would consider a DCAT-AP dataset.


This follows from existing extensions, such as GeoDCAT-AP which aim to
be compatible with DCAT(-AP) whilst adding additional metadata specific
to their domain (in this case for geospatial). Similarly, DPCat aims to
be compatible with DCAT(-AP) while providing additional metadata about
the GDPR/ROPA domain.

GeoDCAT-AP is explicitely a specialisation of DCAT-AP. So removing all GeoDCAT-AP specific properties will result in a DCAT-AP catalogue.
A GeoDCAT-AP dataset is a collection of geospatial data, interpreted as DCAT-AP it is a DCAT-AP dataset.

Note the impact of being a profile of DCAT-AP.
Suppose DCAT-AP makes the dcat:theme mandatory with the (already) mandatory codelist consisting of the official data portal teams.
When ROPA is a profile (a specialisation) of DCAT-AP, then this must be followed.
I do not think that this is the scope of ROPA.

That does not changes the ability for ROPA to align as much as possible with DCAT-AP decisions. But that is different from being a profile-of.

As for tools, as per DCAT-AP specification, anything that conforms with
its requirements (e.g. mandatory fields), can be used by tools expecting
DCAT-AP metadata. Therefore, catalogue tools that work with DCAT-AP
should work with DPCat catalogs, and same for DCAT and its tools. The
GDPR and ROPA usefulness DCAT(-AP) aware tools is a separate matter, and
there is no necessity to have a single tool do both. In fact, this is
the basis for why we use DCAT(-AP) - existing catalog services should
work with DPCat when treating it just like any other catalog metadata.
The additional GDPR/ROPA stuff for these datasets will be additional
extensions or separate tools.
Actually this is not entirely true. It is not because you can aggregate two RDF files that this is a valid operation for your profile.
RDF technically it works, but businesswise the resulting aggregation might be useless.
Consider aggregating the catalogue of the art works of the Britisch museum and the catalogue of Eurostat statistics.
RDF wise you can aggregate these if they are expressed as DCAT catalogues, but non of the queries on the individual catalogues makes sense in the aggregation.

So aggregating DCAT-AP catalogues should result in sensible DCAT-AP catalogues, likewise aggregating ROPA catalogues should  result in ROPA catalogues.
So if you consider ROPA a profile of DCAT-AP then aggregating with data.europa.eu is a resulting should result in a valid aggregation both for ROPA as for DCAT-AP.
And I am not sure this is the case.
This aggregation usecase is a critical test for a profile for me. If the answer is one should not combine both, then one should not be a profile of one-other. Then it are separate DCAT profiles. I am fine with that because I can accept that DCAT is very abstract and domain neutral (despite even most of the usecases are the result from DCAT-AP scope).

(*) honestly I would not advice to provide GeoNetwork as-is as editorial tool for ROPA catalogues. It will be a total mismatch of terminology.

AFAIK, the definition of a ROPARecord does match the definition of DCAT
dataset (note that DCAT-AP does not properly redefine dcat:Dataset). I
quote the definition from DCAT and DCAT-AP below, which to me is
compatible with the view of ROPA information being a dataset.
That is because my remark is not at the level of the term but at the level of the scope of the profile.  This has not trickled down in the definition of the class Dataset.

DCAT (v2): "A collection of data, published or curated by a single
source, and available for access or download in one or more
representations."

DCAT-AP (v2.1.0) "Mandatory class. A conceptual entity that represents
the information published."


Finally, perhaps what the main issue of contention might be, for ROPA
(the GDPR concept) - its what is inside that ROPA (i.e. its records or
entries) that is needed to be queried. Just the fact that a ROPA exists
with a timestamp is not of much use or value. One can view the ROPA as
being a catalog record of processing activities - which is what DPCat
tries to represent.

So the discussion can be narrowed down to whether dpcat:ROPARecord
should be a dcat-ap:Dataset or is it itself the contents of a dataset -
on which DPCat states that ROPA entries are 'metadata' to processing
records as that's what is of use to stakeholders who will be using this
information.
Indeed, that this reason why I mentioned Prov-O.
I could reword my use case  as follows, if I have to find a dataset describing the records of processing activities by European Institutions in data.europa.eu, what would then be the metadata?
E.g. it could be that the publisher must be a EU institution, and that dct:conforms to must be "Records Register | European Data Protection Supervisor (europa.eu)<https://edps.europa.eu/about/data-protection-within-edps/records-register_en>"
Then the distribution (or the associated data service) contains all the records.

So having a DCAT-AP profile for ROPA that allows to find in data.europa easily all ROPA datasets is a profile like GeoDCAT-AP. Probably it is a small extension. Very useful. If this is combined with an application profile describing the content of the ROPA dataset (a RDF distribution) then it is even more attractive.
Maybe your intend with DPCat is more on the level of describing the metadata of a single photo, to use my metaphore. But DCAT-AP is more about the metadata of the collection of photo's.

kr,

Bert


On 19/04/2022 18:31, Bert Van Nuffelen wrote:
> This is what I suspected. Now I come to the alignment statement with
> DCAT-AP. If you state you want to align with DCAT-AP, and the scope of
> DCAT-AP is describing the datasets as I illustrated, then one cannot
> state that DPCat ROPARecord is a DCAT-AP dataset.
> Then it is an independent DCAT profile which may take inspiration from
> DCAT-AP, but which content and scope is different.
> Even more the tools and agreements suited for DCAT-AP (open data
> portals) are very lickely not fitting the ROPA implementations.
>
> That is what I meant with the definition of a ROPARecord should match
> the definition of a DCAT-AP dataset.
> You confirm now that these are two distinct things and should not be mixed.
>
> But there can be a connection, e.g.

--
---
Harshvardhan J. Pandit, Ph.D
Research Fellow
ADAPT Centre, Trinity College Dublin
https://harshp.com/
Received on Tuesday, 19 April 2022 21:03:00 UTC