Re: Overlap between DCAT and ADMS from Gofran on 2012-10-24 (public-gld-wg@w3.org from October 2012)

From: Gofran <gofran.shukair@deri.org>
Date: Wed, 24 Oct 2012 23:35:29 +0100
To: Phil Archer <phila@w3.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, Public GLD WG <public-gld-wg@w3.org>
Message-Id: <F2C44F85-9DC0-4125-9BB0-7C2B611CDC4C@deri.org>
*redirecting to the mailing list*

On 23 Oct 2012, at 18:16, Phil Archer wrote:

> I have a lot of sympathy with this.
>
> When I first was tasked with creating the RDF schema for ADMS, I  
> used a load of DCAT properties and only introduced a few new ones.  
> It was the introduction of a third related vocab (called ADMS for  
> Software, ADMS.SW, which is not on the GLD work list) that I came up  
> with RADion. That may or may not have been a sensible idea but it  
> seems to be in line with the sentiment here in that what we have are  
> two similar vocabs. They're slightly different because the people  
> that have created them have slightly different perspectives.
>
> DCAT is not designed, for example, to describe 3 separate PDFs  
> wrapped up in a zip file - ADMS is (among other things).
>
> But lest we get too hung up, perhaps we can take a little step back.  
> I've just made another couple of tweaks to the ADMS spec, RDF schema  
> and namespace HTML doc in readiness for Thursday (all linked from  
> the wiki).
>
> ADMS defines 5 classes and a bunch of properties that for the most  
> part have no direct parallel in DCAT. But... like DCAT, most 'ADMS  
> data' is Dublin Core.
>
> It's a difference of emphasis, a difference in approach and  
> therefore a difference in what gets included and not included in the  
> vocab.
>
> I see a number of options:
>
> 1. Spend time working to align DCAT and ADMS more closely (in which  
> case RADion is either a help or a hindrance - if the latter we don't  
> have to be bound by it). That *might* then lead to the kind of  
> integration we've done with RegORG and ORG. That's probably the  
> ideal but do we have time and and willingness? Also, very  
> significant effort has already been expended in creating ADMS and  
> ADMS.SW-compliant data.
>
> 2. Recognise that the overlap is significant but not a huge problem  
> in itself since so many of the properties used are from dcterms. The  
> ones that aren't are, of course, the more specialised ones. Cross  
> reference ADMS and DCAT and say "take your pick - and here's why you  
> might choose one over the other." Gofran's e-mail about re-usability  
> is helpful here I think, as would be a short text highlighting the  
> different approaches taken. I believe most potential users will feel  
> more comfortable with one or the other and the choice is generally  
> going to be made by repositories that harvest the data, not data  
> publishers looking for an outlet for their data. As well as W3C we  
> have national governments and fellow standards bodies publishing  
> ADMS data (Denmark, OASIS, Open Metadata Registry, GS1...).
>
I agree on this option , the overlap is significant and but also the  
use case difference is obvious as well.
RADion still seems to be a good idea to me to embrace this for now

> 3. If the WG feels either route above is not right for a Rec Track  
> document then we can publish ADMS as a WG Note and namespace doc and  
> more or less leave it at that. As you can imagine, I'd rather not  
> take this route but the WG is sovereign.
>
> Phil.
>
>
>
>
> On 21/10/2012 17:29, Richard Cyganiak wrote:
>> Hi Gofran, hi Phil,
>>
>> I think the fundamental problem here is that we have two specs that  
>> have a large overlap in scope, but neither is a subset of the  
>> other, and *probably* neither can be easily extended to cover the  
>> other without losing its focus.
>>
>> What is the overlap between both?
>>
>> Phil developed RADion in an attempt to “factor out” the overlap of  
>> both specs: repositories, assets, distributions. But I think that  
>> RADion fails to get to the essence of the overlap. It may be  
>> correct on the repositories and assets, but fails with the  
>> distributions, or at least has a conception of distribution that  
>> isn't sufficiently generic to properly cover DCAT.
>>
>> I think the overlap of DCAT and ADMS is that both are catalogs of  
>> metadata records designed for finding “assets” of some kind. They  
>> differ, however, in the kind of assets that are listed in the  
>> catalog, although there is overlap.
>>
>> Since the kinds of assets are different, there's a lot of  
>> difference in the metadata that is required to adequately describe  
>> them, and in the additional secondary concepts related to the  
>> asset, and in the relationships that need to be recorded between  
>> assets, and in the means of accessing the assets.
>>
>> I'm not suggesting any particular course of action as a result of  
>> this observation. We should closely study the overlap between  
>> DCAT's Catalog, Dataset, and CatalogRecord on the one hand, and  
>> ADMS' Repository and Asset on the other hand, and also study their  
>> relationships to things already out there.
>>
>> The more I think about it, the more I get worried that we're in the  
>> process of not just reinventing the wheel, but reinventing it  
>> twice, in parallel.
>>
>> Best,
>> Richard
>>
>>
>> On 20 Oct 2012, at 19:37, Gofran wrote:
>>
>>> Hi all,
>>>
>>> The sets of resources that ADMS and DCAT describe are intersected,  
>>> IMHO, I like to use the term "reusable" to point to the resources  
>>> that ADMS describes and the set of reusable resources include (but  
>>> not limited to) codelists, taxonomies, datasets ...etc as long as  
>>> they can be reused in a diffrent context  and the ADMS purpose is  
>>> to facilitate this by describing them using the right terms.
>>>
>>> A dataset in data.gov.uk for instance is basically a useful  
>>> dataset for certain applications but it is not a reference dataset  
>>> and DCAT should be used to describe it.
>>> While a dataset about the languages in the EU is certainly more  
>>> like a reusable asset that has broader usage base and ADMS and/or  
>>> DCAT can be used to describe it (though it is not a "semantic"  
>>> asset per se)
>>>
>>> The problem as I see it , how to extend ADMS (or DCAT or both) to  
>>> describe this difference .
>>>
>>> On 20 Oct 2012, at 14:17, Richard Cyganiak wrote:
>>>
>>>> Hi Phil,
>>>>
>>>> On 18 Oct 2012, at 18:33, Phil Archer wrote:
>>>>> "ADMS, the Asset Description Metadata Schema, is a vocabulary  
>>>>> for describing so-called Semantic Assets, that is, things like  
>>>>> standards, code lists and taxonomies. Although it has a lot in  
>>>>> common with the Data Catalog vocabulary [DCAT], notably the  
>>>>> extensive use of Dublin Core [DC11], someone searching for a  
>>>>> Semantic Asset is likely to have different needs, priorities and  
>>>>> expectations than someone searching for a data set and these  
>>>>> differences are reflected in ADMS. In particular, users seeking  
>>>>> a Semantic Asset are likely to be searching for 'a document' —  
>>>>> something they can open and read using familiar desktop  
>>>>> software, as opposed to something that needs to be processed. Of  
>>>>> course this is a very broad generalization. If a code list is  
>>>>> published as a SKOS Concept scheme then it is both a Semantic  
>>>>> Asset and a dataset and it can be argued that all Semantic  
>>>>> Assets are datasets. Therefore the difference in /user  
>>>>> expectation/ is at the heart of what distinguishes ADMS from  
>>>>> DCAT."
>>>>
>>>> I have a number of issues with this.
>>>>
>>>> 1. You describe the purpose of ADMS as: “It's for describing  
>>>> things like standards, code lists and taxonomies.” This is too  
>>>> fuzzy. You can't have weasel words such as “like” in the sentence  
>>>> that states the purpose of a technology. Law texts are a bit like  
>>>> standards, right? So ADMS is for describing them too?
>>>>
>>>> 2. The text implies that the kinds of things described in DCAT  
>>>> cannot be “open and read using familiar desktop software”. This  
>>>> is not the case. In most data catalogs, the most common formats  
>>>> are CSV and Excel.
>>>>
>>>> 3. It is not particularly likely that code lists and taxonomies  
>>>> -- things that ADMS is intended to describe -- can be opened and  
>>>> read in familiar desktop software.
>>>>
>>>> 4. If the main difference is indeed one of user expectation and  
>>>> not one of vocabulary semantics, then a catalog-level flag in  
>>>> DCAT might be sufficient to eliminate the need for ADMS. Surely  
>>>> it is not that easy. So I don't feel that the text above gets to  
>>>> the heart of the difference between DCAT and ADMS.
>>>>
>>>> 5. It is somewhat open whether the “distributions” in DCAT are  
>>>> all machine-readable. There is an open DCAT issue about renaming  
>>>> “distribution” to “resource” and allowing pretty much arbitrary  
>>>> related online artefacts, including documentation and the like.
>>>>
>>>> Best,
>>>> Richard
>>>>
>>>>
>>>
>>
>>
>>
>
> -- 
>
>
> Phil Archer
> W3C eGovernment
> http://www.w3.org/egov/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
Received on Wednesday, 24 October 2012 23:04:36 UTC