Overlap between DCAT and ADMS (was: Re: ADMS spec document update) from Richard Cyganiak on 2012-10-21 (public-gld-wg@w3.org from October 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sun, 21 Oct 2012 17:29:55 +0100
To: "Gofran (GS)" <gofran.shukair@deri.org>, Phil Archer <phila@w3.org>
Cc: Public GLD WG <public-gld-wg@w3.org>
Message-Id: <26AEF9FB-86DB-4FE9-B490-A8C0CA523ACE@cyganiak.de>
Hi Gofran, hi Phil,

I think the fundamental problem here is that we have two specs that have a large overlap in scope, but neither is a subset of the other, and *probably* neither can be easily extended to cover the other without losing its focus.

What is the overlap between both?

Phil developed RADion in an attempt to “factor out” the overlap of both specs: repositories, assets, distributions. But I think that RADion fails to get to the essence of the overlap. It may be correct on the repositories and assets, but fails with the distributions, or at least has a conception of distribution that isn't sufficiently generic to properly cover DCAT.

I think the overlap of DCAT and ADMS is that both are catalogs of metadata records designed for finding “assets” of some kind. They differ, however, in the kind of assets that are listed in the catalog, although there is overlap.

Since the kinds of assets are different, there's a lot of difference in the metadata that is required to adequately describe them, and in the additional secondary concepts related to the asset, and in the relationships that need to be recorded between assets, and in the means of accessing the assets.

I'm not suggesting any particular course of action as a result of this observation. We should closely study the overlap between DCAT's Catalog, Dataset, and CatalogRecord on the one hand, and ADMS' Repository and Asset on the other hand, and also study their relationships to things already out there.

The more I think about it, the more I get worried that we're in the process of not just reinventing the wheel, but reinventing it twice, in parallel.

Best,
Richard


On 20 Oct 2012, at 19:37, Gofran wrote:

> Hi all,
> 
> The sets of resources that ADMS and DCAT describe are intersected, IMHO, I like to use the term "reusable" to point to the resources that ADMS describes and the set of reusable resources include (but not limited to) codelists, taxonomies, datasets ...etc as long as they can be reused in a diffrent context  and the ADMS purpose is to facilitate this by describing them using the right terms.
> 
> A dataset in data.gov.uk for instance is basically a useful dataset for certain applications but it is not a reference dataset and DCAT should be used to describe it.
> While a dataset about the languages in the EU is certainly more like a reusable asset that has broader usage base and ADMS and/or DCAT can be used to describe it (though it is not a "semantic" asset per se)
> 
> The problem as I see it , how to extend ADMS (or DCAT or both) to describe this difference .
> 
> On 20 Oct 2012, at 14:17, Richard Cyganiak wrote:
> 
>> Hi Phil,
>> 
>> On 18 Oct 2012, at 18:33, Phil Archer wrote:
>>> "ADMS, the Asset Description Metadata Schema, is a vocabulary for describing so-called Semantic Assets, that is, things like standards, code lists and taxonomies. Although it has a lot in common with the Data Catalog vocabulary [DCAT], notably the extensive use of Dublin Core [DC11], someone searching for a Semantic Asset is likely to have different needs, priorities and expectations than someone searching for a data set and these differences are reflected in ADMS. In particular, users seeking a Semantic Asset are likely to be searching for 'a document' — something they can open and read using familiar desktop software, as opposed to something that needs to be processed. Of course this is a very broad generalization. If a code list is published as a SKOS Concept scheme then it is both a Semantic Asset and a dataset and it can be argued that all Semantic Assets are datasets. Therefore the difference in /user expectation/ is at the heart of what distinguishes ADMS from DCAT."
>> 
>> I have a number of issues with this.
>> 
>> 1. You describe the purpose of ADMS as: “It's for describing things like standards, code lists and taxonomies.” This is too fuzzy. You can't have weasel words such as “like” in the sentence that states the purpose of a technology. Law texts are a bit like standards, right? So ADMS is for describing them too?
>> 
>> 2. The text implies that the kinds of things described in DCAT cannot be “open and read using familiar desktop software”. This is not the case. In most data catalogs, the most common formats are CSV and Excel.
>> 
>> 3. It is not particularly likely that code lists and taxonomies -- things that ADMS is intended to describe -- can be opened and read in familiar desktop software.
>> 
>> 4. If the main difference is indeed one of user expectation and not one of vocabulary semantics, then a catalog-level flag in DCAT might be sufficient to eliminate the need for ADMS. Surely it is not that easy. So I don't feel that the text above gets to the heart of the difference between DCAT and ADMS.
>> 
>> 5. It is somewhat open whether the “distributions” in DCAT are all machine-readable. There is an open DCAT issue about renaming “distribution” to “resource” and allowing pretty much arbitrary related online artefacts, including documentation and the like.
>> 
>> Best,
>> Richard
>> 
>> 
>
Received on Sunday, 21 October 2012 16:31:15 UTC