RE: DCAT and ADMS from Makx Dekkers on 2013-04-05 (public-gld-wg@w3.org from April 2013)

From: Makx Dekkers <makx@makxdekkers.com>
Date: Fri, 5 Apr 2013 10:44:52 +0200
To: "'Richard Cyganiak'" <richard@cyganiak.de>
Cc: "'Public GLD WG'" <public-gld-wg@w3.org>
Message-ID: <002e01ce31d9$d81d5bb0$88581310$@makxdekkers.com>
If that is the case, there is no problem, and the ADMS asset class can be a subclass of the DCAT dataset class as far as I am concerned.

 

Thanks!

 

Makx.

 

 


Makx Dekkers

makx@makxdekkers.com

+34 639 26 11 46

 

 

From: Richard Cyganiak [mailto:richard@cyganiak.de] 
Sent: Friday, April 05, 2013 10:42 AM
To: Makx Dekkers
Cc: Public GLD WG
Subject: Re: DCAT and ADMS

 

Makx,

 

Datasets don't have to be in a catalog, and datasets don't have to have a distribution.

 

The DCAT model *is* what's in the spec. There is no other model “behind DCAT”.

 

Richard

 


On 4 Apr 2013, at 20:28, "Makx Dekkers" <makx@makxdekkers.com> wrote:

In today’s conference call I was tasked to outline my observations related to the declaration of adms:SemanticAsset as a subclass of dcat:Dataset.

 

After reading the DCAT specification, I started to build a picture for myself of the model behind DCAT, not the model as expressed in the UML model in the spec, but what I think the “real-world” model is that sits behind it.

 

As DCAT is the “Data Catalog Vocabulary”, it occurred to me that the main perspective is on the concept of a catalog which I understand as a sort of virtual filestore. In the filestore there are physical files which are the Distributions. 

 

DCAT describes the virtual filestore itself with the properties of the Catalog. The description of the files in the filestore is done in two layers:

 

·         The first, conceptual, layer is the Dataset. The description of the Dataset contains a number of characteristics that are independent of the physical embodiment (in the sense of file and data formats). This is particularly useful when a Dataset is implemented in more than one format (e.g. CSV and RDF) as this abstraction allows you to express those conceptual characteristics only once for several implementations.

 

·         The second layer is the Distribution. The description of the Distribution contains the characteristics of the actual file that implements the dataset.

 

In building this mental model, I asked myself whether the model of DCAT requires that:

 

1.       Every Dataset must necessarily exist in a Catalog, or in other words, things that are not in a Catalog cannot be a dcat:Dataset

2.       Every Dataset must necessarily have at least one Distribution, or in other words, if there is no physical file, there cannot be a dcat:Dataset

 

Mind you, this whole mental model is not explicit in DCAT but I thought that this was the underlying model by reading between the lines and listening to people in the working group.

 

For example:

 

·         Dan’s definition “Any file stored on disk is a data set”;

·         Phil’s comment that “if something is not available for download then it's not in the catalogue and therefore out of scope for DCAT”;

·         Richard’s response that a dataset does not have to have a distribution on the Web, but implying (my interpretation) that it does need to have a distribution that can be accessed by some other mechanism.

 

The discussion I had with Phil was related to the mental model that underlies ADMS. ADMS is called the “Asset Description Metadata Schema” which indicates that the central entity in that model is the Asset. The Asset can be part of a Repository (similar to the Catalog), but it does not have to be – lone Assets are conceivable. Furthermore, Assets can be described before there are any physical manifestations – for example, if you have a project that has the objective to create an Asset, you can describe the Asset before you have even started the work; this is similar to the situation where authors sell the rights to books they haven’t written yet.

 

Based on those differences, I then started to wonder whether there could be a problem making the class of ADMS Assets a subclass of the DCAT Dataset class.

 

Maybe the point is moot, if my mental model of DCAT is wrong and I was reading things into the spec that were not intended.

 

In any case, I wanted to ask the WG to consider whether the specification should contain clarifications either way – either that there are no expectations concerning the relationship between Catalog, Dataset and Distribution, or that there are specific expectations of the types outlined above.

 

Looking forward to hearing your views.

 

Makx.

 

 

 

 

 

 

 

 

 


Makx Dekkers

makx@makxdekkers.com

+34 639 26 11 46
Received on Friday, 5 April 2013 08:45:33 UTC