DCAT and ADMS

In today's conference call I was tasked to outline my observations
related to the declaration of adms:SemanticAsset as a subclass of
dcat:Dataset.

 

After reading the DCAT specification, I started to build a picture for
myself of the model behind DCAT, not the model as expressed in the UML
model in the spec, but what I think the "real-world" model is that sits
behind it.

 

As DCAT is the "Data Catalog Vocabulary", it occurred to me that the
main perspective is on the concept of a catalog which I understand as a
sort of virtual filestore. In the filestore there are physical files
which are the Distributions. 

 

DCAT describes the virtual filestore itself with the properties of the
Catalog. The description of the files in the filestore is done in two
layers:

 

.         The first, conceptual, layer is the Dataset. The description
of the Dataset contains a number of characteristics that are independent
of the physical embodiment (in the sense of file and data formats). This
is particularly useful when a Dataset is implemented in more than one
format (e.g. CSV and RDF) as this abstraction allows you to express
those conceptual characteristics only once for several implementations.

 

.         The second layer is the Distribution. The description of the
Distribution contains the characteristics of the actual file that
implements the dataset.

 

In building this mental model, I asked myself whether the model of DCAT
requires that:

 

1.       Every Dataset must necessarily exist in a Catalog, or in other
words, things that are not in a Catalog cannot be a dcat:Dataset

2.       Every Dataset must necessarily have at least one Distribution,
or in other words, if there is no physical file, there cannot be a
dcat:Dataset

 

Mind you, this whole mental model is not explicit in DCAT but I thought
that this was the underlying model by reading between the lines and
listening to people in the working group.

 

For example:

 

.         Dan's definition "Any file stored on disk is a data set";

.         Phil's comment that "if something is not available for
download then it's not in the catalogue and therefore out of scope for
DCAT";

.         Richard's response that a dataset does not have to have a
distribution on the Web, but implying (my interpretation) that it does
need to have a distribution that can be accessed by some other
mechanism.

 

The discussion I had with Phil was related to the mental model that
underlies ADMS. ADMS is called the "Asset Description Metadata Schema"
which indicates that the central entity in that model is the Asset. The
Asset can be part of a Repository (similar to the Catalog), but it does
not have to be - lone Assets are conceivable. Furthermore, Assets can be
described before there are any physical manifestations - for example, if
you have a project that has the objective to create an Asset, you can
describe the Asset before you have even started the work; this is
similar to the situation where authors sell the rights to books they
haven't written yet.

 

Based on those differences, I then started to wonder whether there could
be a problem making the class of ADMS Assets a subclass of the DCAT
Dataset class.

 

Maybe the point is moot, if my mental model of DCAT is wrong and I was
reading things into the spec that were not intended.

 

In any case, I wanted to ask the WG to consider whether the
specification should contain clarifications either way - either that
there are no expectations concerning the relationship between Catalog,
Dataset and Distribution, or that there are specific expectations of the
types outlined above.

 

Looking forward to hearing your views.

 

Makx.

 

 

 

 

 

 

 

 

 


Makx Dekkers

 <mailto:makx@makxdekkers.com> makx@makxdekkers.com

+34 639 26 11 46

 

 

Received on Thursday, 4 April 2013 19:29:14 UTC