Re: Monolithic vs Modular Taxonomy/Ontology

Hi. Updates on this.
We're leaning towards the 'modular' approach i.e. keeping only the 
top-level terms in DPV (main vocabulary), such as PersonalDataCategory, 
special data, and so on, and moving the rest of the hierarchy to a 
personal-data module/extension, e.g. as namespace.

We'll be making a decision on DEC-15 on this. Please state your 
support/objections/questions before then.

The what/why/where is available in the text/links below from previous 
emails in this thread.

tldr; DPV is growing large, people are making mistakes in using personal 
data concepts in other areas e.g. Location (personal data) for stating 
data storage location or company location. Renaming cannot work for all 
cases, and eventually leads to more confusing. Modularity is accepted as 
good practice (e.g. see FIBO). There is no major (semantic) confusion 
between other concepts remaining in DPV.

Today's minues

This thread: and


On 03/11/2021 06:56, Piero Bonatti wrote:
> Hi everyone,
> as a theoretician, I would prefer the "modular" approach, that is, 
> different namespaces for each taxonomy (which is cleaner).
> However, if a single taxonomy is preferred, then Harsh's proposal sounds 
> good, as it resembles the solution with different namespaces/prefixes. 
> Option 1) is my favorite.
> Best regards
> Piero
> On 02/11/21 20:16, Harshvardhan J. Pandit wrote:
>> Hello. We have a pending issue which is causing confusion to adopters 
>> where terms defined as PersonalDataCategory (e.g. Location or Country) 
>> are used in ways that are not compatible with stating they are 
>> personal data.
>> For example, specifying `StorageData hasLocation Location` which makes 
>> the location personal data instead of a generic location. This can be 
>> fixed by using StorageLocation, but the issue stands for several other 
>> concepts. We've renamed there as and where they arose but as more 
>> concepts are added, this clashing and confusion is increasing as well.
>> This is an important issue for the application of DPV, and a cause of 
>> common confusion and mistakes. I appreciate any thoughts/suggestions 
>> as you may have, but we need to make a decision by December if DPV is 
>> to progress towards showing applications.
>> In recent meeting calls [1] we decided to go with option #3 to resolve 
>> personal data categories clashing with other concepts (see July email 
>> [2]). This solution consists of prefixing/suffixing the category IRIs 
>> with something akin "pd_" so that "Location" becomes "pd_Location" 
>> which differentiates it from storage / company location.
>> The discussion consisted of going over the alternatives, and choosing 
>> this option since present members preferred having the data categories 
>> within DPV rather than as a separate taxonomy (preferring single 
>> monolithic vocabulary provision). The argument for this was that DPV 
>> can provide commonly used/needed categories, and thus the 'single 
>> vocabulary' packaging is attractive for adopters.
>> Some stylistic options discussed:
>> 1) pd_Location
>> 2) Location_pd
>> 3) LocationPD
>> 4) PDLocation
>> 5) PD_Location
>> 6) LocationData (from GitHub Issue #27 [3])
>> 7) LocationPersonalData
>> Personally, I still prefer providing personal data categories as a 
>> separate taxonomy that can grow on its own, but if the group consensus 
>> is to choose from this list, I'd pick #1 and #4 (in that order).
>> Again, suggestions for how to resolve this are needed, are welcome, 
>> and appreciated.
>> [1]
>> [2]
>> [3]
>> Regards,
>> Harsh
>> On 29/07/2021 11:18, Harshvardhan J. Pandit wrote:
>>> Hello.
>>> As DPV continues to grow, we're reaching a stage where there is a 
>>> noticeable impact in terms of personal data categories and other 
>>> concepts. The approach to rename personal data category or other 
>>> non-data concepts can only work so many times, and will eventually 
>>> cause confusion in adopters. What are potential solutions for this?
>>> For example, (i) Certification as Personal Data and (ii) 
>>> Certification as Organisational Measure. We resolved this by renaming 
>>> (i) to ProfessionalCertification. This measure cannot always be used.
>>> Another example, (i) Location as personal data category, and (ii) 
>>> Location for indicating personal data storage. We resolved this by 
>>> avoiding (i) in not providing hasStorage property with any range and 
>>> defining StorageLocation.
>>> Problem if we define both concepts using same label or IRI: Any time 
>>> someone wants to specify a Certification or Location for data storage 
>>> or transfer, it is defined as personal data as well, or the label 
>>> causes confusion and they use the wrong concept. Not a good design IMHO.
>>> Solutions:
>>> 1) Keep only the 'top-tier' personal data taxonomy in DPV and move 
>>> others outside into a dpv-personal-data extension. This is my 
>>> preferred approach because it keeps concepts in other modules (E.g. 
>>> technical measures) with the commonly used words without overlap with 
>>> personal data. AFAIK the issue only exists with overlap between 
>>> personal data categories and other concepts.
>>> 2) Keep only the 'top-tier' concepts for all modules and move other 
>>> concepts outside into specific taxonomies. Not my preferred option 
>>> because it means adopters need to import a lot of vocabularies to get 
>>> commonly used concepts e.g. technical measures.
>>> 3) Keep concepts as they are, with same label for multiple concepts 
>>> in different modules, but different IRI. E.g. pd_Location for 
>>> personal data categories and Location for the generic concept.

Harshvardhan J. Pandit, Ph.D
Research Fellow
ADAPT Centre, Trinity College Dublin

Received on Wednesday, 8 December 2021 16:47:25 UTC