Re: Overlap between DCAT and ADMS from Richard Cyganiak on 2012-10-25 (public-gld-wg@w3.org from October 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 25 Oct 2012 10:34:57 +0100
To: Phil Archer <phila@w3.org>
Cc: "Gofran (GS)" <gofran.shukair@deri.org>, Public GLD WG <public-gld-wg@w3.org>
Message-Id: <D5BA7028-F792-4707-B791-31C1B7BE65BB@cyganiak.de>
On 23 Oct 2012, at 18:16, Phil Archer wrote:
> It's a difference of emphasis, a difference in approach and therefore a difference in what gets included and not included in the vocab.

The core question to me is not what the difference is, but what the commonality is.

> I see a number of options:
> 
> 1. Spend time working to align DCAT and ADMS more closely (in which case RADion is either a help or a hindrance - if the latter we don't have to be bound by it). That *might* then lead to the kind of integration we've done with RegORG and ORG. That's probably the ideal but do we have time and and willingness?

I agree that this is the ideal. Willingness is hopefully not the issue. Time might be. I would say that we should try to figure out what the Right Thing to do is here, while acknowledging that due to time and charter constraints we may not be able to make the Right Thing happen and may have to compromise.

> Also, very significant effort has already been expended in creating ADMS and ADMS.SW-compliant data.

I understand that, but in terms of the W3C process, this shouldn't really matter. If you implement a W3C working draft, then you're running the risk that the draft may change significantly. And ADMS isn't even a W3C working draft yet. If change is not an option due to existing implementations, then submission to W3C for standardization is not a good idea.

> 2. Recognise that the overlap is significant but not a huge problem in itself since so many of the properties used are from dcterms.

That says almost nothing. Almost everything uses lots of properties from dcterms, full stop. It doesn't follow that overlap in the non-DC parts of the vocabulary isn't a problem.

> Cross reference ADMS and DCAT and say "take your pick - and here's why you might choose one over the other."

I'm very much opposed to this approach. The WG is chartered to deliver a *standard* vocabulary, and not to increase the number of available alternatives by producing more competing vocabularies. What you describe is something that answers to the “Best Practices for Vocabulary Selection” deliverable, not to the “Standard Vocabularies” deliverable.

If there's commonality in scope between DCAT and ADMS, then I think this needs to be properly addressed.

In particular, and to set a benchmark for what I mean by “properly addressed”: I don't see a technical reason why a single aggregator software shouldn't be usable for aggregating DCAT records and ADMS records. The reasons against are basically “NIH” and “changing anything now would be too expensive/inconvenient”. I think neither of these reasons can have a place on a W3C WG table. (This argument cuts both ways, of course — maybe every data catalog should just adopt ADMS?)

> Gofran's e-mail about re-usability is helpful here I think, as would be a short text highlighting the different approaches taken. I believe most potential users will feel more comfortable with one or the other and the choice is generally going to be made by repositories that harvest the data, not data publishers looking for an outlet for their data.

This appears to make the argument that people don't really need or want a W3C standard here.

> As well as W3C we have national governments and fellow standards bodies publishing ADMS data (Denmark, OASIS, Open Metadata Registry, GS1...).

Are these deployments documented somewhere?

Are the stakeholders represented in the WG?

> 3. If the WG feels either route above is not right for a Rec Track document then we can publish ADMS as a WG Note and namespace doc and more or less leave it at that. As you can imagine, I'd rather not take this route but the WG is sovereign.

This, again, doesn't seem to make technical sense for the same reasons as 2. It would be a political move.

I will end by pointing out that you've pulled a bait-and-switch with us. You've submitted a DCAT extension for consideration to the WG. The WG agreed that it makes sense to take that on as a deliverable, as it seemed like a valuable addition that actually strengthens DCAT by extending its applicability. Much later in the WG's lifetime, you informed the WG that actually ADMS is no longer a DCAT extension but its completely separate -- and somewhat competing -- thing. I'm not sure if this is the result of clever political planning or of the fact that you're sitting between a number of chairs with W3C and the EC, but at any rate this has unfortunately made the WG's work considerably more difficult and is doing bad things to our schedule.

Is the commonality between DCAT, ADMS and ADMS.SW that they all define repositories of metadata records that describe things of some nature, in order to allow finding of things and aggregating/federating/harvesting of repositories? And the difference is in the nature of the things described therein? Web-accessible datasets (DCAT), interoperability specifications (ADMS), and software thingies (ADMS.SW)? If that is so, then it may be possible to pull out the “repository” aspects into a separate technical piece. But that may exceed the WG scope by quite a bit and step on various toes including OAI-ORE, AtomOwl, SIOC, and ISO 11179.

Best,
Richard



> 
> Phil.
> 
> 
> 
> 
> On 21/10/2012 17:29, Richard Cyganiak wrote:
>> Hi Gofran, hi Phil,
>> 
>> I think the fundamental problem here is that we have two specs that have a large overlap in scope, but neither is a subset of the other, and *probably* neither can be easily extended to cover the other without losing its focus.
>> 
>> What is the overlap between both?
>> 
>> Phil developed RADion in an attempt to “factor out” the overlap of both specs: repositories, assets, distributions. But I think that RADion fails to get to the essence of the overlap. It may be correct on the repositories and assets, but fails with the distributions, or at least has a conception of distribution that isn't sufficiently generic to properly cover DCAT.
>> 
>> I think the overlap of DCAT and ADMS is that both are catalogs of metadata records designed for finding “assets” of some kind. They differ, however, in the kind of assets that are listed in the catalog, although there is overlap.
>> 
>> Since the kinds of assets are different, there's a lot of difference in the metadata that is required to adequately describe them, and in the additional secondary concepts related to the asset, and in the relationships that need to be recorded between assets, and in the means of accessing the assets.
>> 
>> I'm not suggesting any particular course of action as a result of this observation. We should closely study the overlap between DCAT's Catalog, Dataset, and CatalogRecord on the one hand, and ADMS' Repository and Asset on the other hand, and also study their relationships to things already out there.
>> 
>> The more I think about it, the more I get worried that we're in the process of not just reinventing the wheel, but reinventing it twice, in parallel.
>> 
>> Best,
>> Richard
>> 
>> 
>> On 20 Oct 2012, at 19:37, Gofran wrote:
>> 
>>> Hi all,
>>> 
>>> The sets of resources that ADMS and DCAT describe are intersected, IMHO, I like to use the term "reusable" to point to the resources that ADMS describes and the set of reusable resources include (but not limited to) codelists, taxonomies, datasets ...etc as long as they can be reused in a diffrent context  and the ADMS purpose is to facilitate this by describing them using the right terms.
>>> 
>>> A dataset in data.gov.uk for instance is basically a useful dataset for certain applications but it is not a reference dataset and DCAT should be used to describe it.
>>> While a dataset about the languages in the EU is certainly more like a reusable asset that has broader usage base and ADMS and/or DCAT can be used to describe it (though it is not a "semantic" asset per se)
>>> 
>>> The problem as I see it , how to extend ADMS (or DCAT or both) to describe this difference .
>>> 
>>> On 20 Oct 2012, at 14:17, Richard Cyganiak wrote:
>>> 
>>>> Hi Phil,
>>>> 
>>>> On 18 Oct 2012, at 18:33, Phil Archer wrote:
>>>>> "ADMS, the Asset Description Metadata Schema, is a vocabulary for describing so-called Semantic Assets, that is, things like standards, code lists and taxonomies. Although it has a lot in common with the Data Catalog vocabulary [DCAT], notably the extensive use of Dublin Core [DC11], someone searching for a Semantic Asset is likely to have different needs, priorities and expectations than someone searching for a data set and these differences are reflected in ADMS. In particular, users seeking a Semantic Asset are likely to be searching for 'a document' — something they can open and read using familiar desktop software, as opposed to something that needs to be processed. Of course this is a very broad generalization. If a code list is published as a SKOS Concept scheme then it is both a Semantic Asset and a dataset and it can be argued that all Semantic Assets are datasets. Therefore the difference in /user expectation/ is at the heart of what distinguishes ADMS from DCAT."
>>>> 
>>>> I have a number of issues with this.
>>>> 
>>>> 1. You describe the purpose of ADMS as: “It's for describing things like standards, code lists and taxonomies.” This is too fuzzy. You can't have weasel words such as “like” in the sentence that states the purpose of a technology. Law texts are a bit like standards, right? So ADMS is for describing them too?
>>>> 
>>>> 2. The text implies that the kinds of things described in DCAT cannot be “open and read using familiar desktop software”. This is not the case. In most data catalogs, the most common formats are CSV and Excel.
>>>> 
>>>> 3. It is not particularly likely that code lists and taxonomies -- things that ADMS is intended to describe -- can be opened and read in familiar desktop software.
>>>> 
>>>> 4. If the main difference is indeed one of user expectation and not one of vocabulary semantics, then a catalog-level flag in DCAT might be sufficient to eliminate the need for ADMS. Surely it is not that easy. So I don't feel that the text above gets to the heart of the difference between DCAT and ADMS.
>>>> 
>>>> 5. It is somewhat open whether the “distributions” in DCAT are all machine-readable. There is an open DCAT issue about renaming “distribution” to “resource” and allowing pretty much arbitrary related online artefacts, including documentation and the like.
>>>> 
>>>> Best,
>>>> Richard
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
> 
> -- 
> 
> 
> Phil Archer
> W3C eGovernment
> http://www.w3.org/egov/
> 
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>
Received on Thursday, 25 October 2012 09:35:27 UTC