Collection class for DCAT

This is a suggestion for a new class and properties in DCAT, also an example of how we might apply them - which example is, in the same time, an illustration why a new class and properties seem required.

We may need a Collection class in DCAT as Catalog seem too high level, and Dataset too granular for the actual data catalogues modeling in some situations. You can of course model a collection with what is currently in DCAT (a Catalog or a Dataset) but this may not be what DCAT designers wanted: a Catalog seems intended for modeling top level resource of a "data publisher", and a Dataset - for modeling a set of data records without identity for each record. A Collection would allow to model data aggregations in situation where a Collection member should have a clear identity.

What else we may need is giving a structure for Collections as just introducing a new Collection class sitting between Catalog and Dataset may not be enough for modeling the actual data organization. We may need a few properties for Collection then:
* "related": to link to other Collection or anything else,
* "reference": to designate homepage or other stable reference for the Collection; it may be an inverse functional property similar to "homepage" for the Catalog,
* "contains": to designate parts of the Collection - other Collections or perhaps standalone datasets (An alternative to having this property may be Collection inheritance from other Collection but this may contradict the spirit of DCAT which seems avoiding inheritance)

A possible use of a new Collection class and properties is illustrated by the following example with 4 parts in it:

1) http://www.esds.ac.uk/government/frs/ (Family Resources Survey homepage on the UK Data Archive portal)
This can be a top level data "Collection" - registered in the UK Data Archive "Catalog".

2) http://farm.ccsr.ac.uk/cgi-bin/esds/nsproxy/nsproxy2011.cgi?http://www.statistics.gov.uk/StatBase/Product.asp?vlnk=9267 (This link is  recommended by the UK Data Archive for detailed information about the Family Resources Survey; it leads to the Office For National Statistics)
So 2 can be described as "related"  to 1.
DCAT as a whole is much focussed on data and less on intellectual entities; the latter however seem need some representation in DCAT so "related" property could be a very lightweight and generic means to refer to the intellectual entities  (research programs, continuous surveys etc.), e.g. to the rationale for having the data in the Collection.

3) http://www.esds.ac.uk/findingData/frsTitles.asp  (Family Resources Survey sub-series driven by different licensing)
Top Collection in 1 then can be thought of as it "contains" Collections from 3, with the rationale for having sub-Collections described via "related" property, i.e. links to the specific licensing descriptions.

4) http://www.esds.ac.uk/findingData/snDescription.asp?sn=7085 (Example for collection of datasets and associated materials in Excel, PDF, and HTML formats for a particular annual survey)
Each Collection listed in 3 then "contains" a number of Collections like 4. In this example, the homepage is obviously prone to change if the server technology changes, so something like "sn7085" can be a better Collection "reference".

If we do not have a Collection class and a few properties introduced then it seems hard to properly model a Catalog structure in cases like in the example above.

With kind regards,
Vasily Bunakov
STFC Scientific Computing
E-mail: vasily.bunakov@stfc.ac.uk


-- 
Scanned by iCritical.

Received on Friday, 7 December 2012 14:48:04 UTC