Re: Monolithic vs Modular Taxonomy/Ontology

Hello. We have a pending issue which is causing confusion to adopters 
where terms defined as PersonalDataCategory (e.g. Location or Country) 
are used in ways that are not compatible with stating they are personal 
data.

For example, specifying `StorageData hasLocation Location` which makes 
the location personal data instead of a generic location. This can be 
fixed by using StorageLocation, but the issue stands for several other 
concepts. We've renamed there as and where they arose but as more 
concepts are added, this clashing and confusion is increasing as well.

This is an important issue for the application of DPV, and a cause of 
common confusion and mistakes. I appreciate any thoughts/suggestions as 
you may have, but we need to make a decision by December if DPV is to 
progress towards showing applications.

In recent meeting calls [1] we decided to go with option #3 to resolve 
personal data categories clashing with other concepts (see July email 
[2]). This solution consists of prefixing/suffixing the category IRIs 
with something akin "pd_" so that "Location" becomes "pd_Location" which 
differentiates it from storage / company location.

The discussion consisted of going over the alternatives, and choosing 
this option since present members preferred having the data categories 
within DPV rather than as a separate taxonomy (preferring single 
monolithic vocabulary provision). The argument for this was that DPV can 
provide commonly used/needed categories, and thus the 'single 
vocabulary' packaging is attractive for adopters.

Some stylistic options discussed:
1) pd_Location
2) Location_pd
3) LocationPD
4) PDLocation
5) PD_Location
6) LocationData (from GitHub Issue #27 [3])
7) LocationPersonalData

Personally, I still prefer providing personal data categories as a 
separate taxonomy that can grow on its own, but if the group consensus 
is to choose from this list, I'd pick #1 and #4 (in that order).

Again, suggestions for how to resolve this are needed, are welcome, and 
appreciated.

[1] https://www.w3.org/2021/10/13-dpvcg-minutes.html
[2] https://lists.w3.org/Archives/Public/public-dpvcg/2021Jul/0006.html
[3] https://github.com/w3c/dpv/issues/27

Regards,
Harsh


On 29/07/2021 11:18, Harshvardhan J. Pandit wrote:
> Hello.
> As DPV continues to grow, we're reaching a stage where there is a 
> noticeable impact in terms of personal data categories and other 
> concepts. The approach to rename personal data category or other 
> non-data concepts can only work so many times, and will eventually cause 
> confusion in adopters. What are potential solutions for this?
> 
> For example, (i) Certification as Personal Data and (ii) Certification 
> as Organisational Measure. We resolved this by renaming (i) to 
> ProfessionalCertification. This measure cannot always be used.
> 
> Another example, (i) Location as personal data category, and (ii) 
> Location for indicating personal data storage. We resolved this by 
> avoiding (i) in not providing hasStorage property with any range and 
> defining StorageLocation.
> 
> Problem if we define both concepts using same label or IRI: Any time 
> someone wants to specify a Certification or Location for data storage or 
> transfer, it is defined as personal data as well, or the label causes 
> confusion and they use the wrong concept. Not a good design IMHO.
> 
> Solutions:
> 1) Keep only the 'top-tier' personal data taxonomy in DPV and move 
> others outside into a dpv-personal-data extension. This is my preferred 
> approach because it keeps concepts in other modules (E.g. technical 
> measures) with the commonly used words without overlap with personal 
> data. AFAIK the issue only exists with overlap between personal data 
> categories and other concepts.
> 
> 2) Keep only the 'top-tier' concepts for all modules and move other 
> concepts outside into specific taxonomies. Not my preferred option 
> because it means adopters need to import a lot of vocabularies to get 
> commonly used concepts e.g. technical measures.
> 
> 3) Keep concepts as they are, with same label for multiple concepts in 
> different modules, but different IRI. E.g. pd_Location for personal data 
> categories and Location for the generic concept.


-- 
---
Harshvardhan J. Pandit, Ph.D
Research Fellow
ADAPT Centre, Trinity College Dublin
https://harshp.com/

Received on Tuesday, 2 November 2021 19:16:49 UTC