Addressing SWIG feedback on the Vocabulary of Interlinked Datasets (VoID) from Richard Cyganiak on 2011-02-16 (semantic-web@w3.org from February 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 16 Feb 2011 00:09:25 +0000
To: SW-forum Web <semantic-web@w3.org>
Cc: Dan Brickley <danbri@danbri.org>, Michael Hausenblas <michael.hausenblas@deri.org>, Keith Alexander <Keith.Alexander@talis.com>, Jun Zhao <jun.zhao@zoo.ox.ac.uk>
Message-Id: <D6F7D31D-D748-4243-8CA9-21771B377C38@cyganiak.de>

Dear all,

In December, we asked this group for feedback regarding the planned publishing of a SWIG Note, called “Describing Linked Datasets with the voiD Vocabulary”.

http://lists.w3.org/Archives/Public/semantic-web/2010Dec/0161.html

We would like to thank all reviewers for their thoughtful comments. This is a response to the collected feedback we have received.

Thomas Bandholtz <thomas@bandholtz@innoq.com> wrote:
> However, there has been some discussion earlier about how it
> relates to DCAT [1] and vice versa.
> There is a lot of overlap, and early adopters might feel puzzled
> which one to use or how to put both together.
> At least one of the two should should address this somehow, don't
> you think?

Michael F. Uschold <uschold@gmail.com> wrote:
> It seems to me that one (DCAT) is the more general case of the
> other (VOID).
> If so, then there should be a core vocabulary shared by both,
> and anything new can be added as an extension.
>
> If neither are proper sub-cases of the other, then it would make
> sense to identify their overlap, create THAT as a shared vocabulary
> about datasets in general. Then DCAT and VOID could both import that
> common core and extend as they see fit.
>
> Creating new variations of essentially the same thing should be
> avoided if at all possible.

Giovanni Tummarello <giovanni.tummarello@deri.org> wrote:
> Dcat for the general dataset terms and Void for the RDF/Linked
> data aspects.
> the merge can probably take no more than 1 day of work ? but please
> do decide on 1 way to say things (And the final format e.g. RDF vs CSV
> doesnt seem a good reason to use an ontology vs another to describe
> e.g. the subject keywords of the dataset)

The relationship between VoID and dcat is this: VoID is intended for describing RDF datasets. Dcat is intended as an exchange standard for data catalogs such as data.gov. As such, they both are about metadata for datasets, but have different focus and audience. The most significant difference is that almost all of VoID is only applicable to RDF datasets, while dcat is agnostic to the format of a dataset, and is designed to handle datasets available in multiple formats.

No formal attempt at aligning both vocabularies has been made. Some initial exploration is here:

http://code.google.com/p/void-impl/issues/detail?id=63

Should both vocabularies be unified into a single vocabulary? We don't think so. Both address different audiences. Dcat's audience (publishers of government data catalogs) is expected to be not very familiar with RDF, hence it is important to keep dcat small and focused. They also have little need for most of the extra features of VoID, because the average data catalog doesn't contain *any* datasets in RDF format.

Should there be a common core between VoID and dcat? Yes, and there already is. Both use Dublin Core for basic metadata, SKOS for categorization, and FOAF for describing agents related to a dataset. To this extent they are already interoperable. Dcat is just a profile of these vocabularies plus a handful of additional terms, such as a dcat:Catalog class.

Should VoID be further aligned to dcat? We think yes, but it's a bit too early. Dcat is still an early draft. The upcoming W3C Government Linked Data WG is chartered to produce a more stable version of dcat. That effort will be a better place and time for addressing the relationship/alignment between the vocabularies. At least one of the VoID spec authors will participate in that WG to ensure proper alignment.

Riccardo Albertoni <riccardo@ge.imati.cnr.it> wrote:
> I have just a little remark concerning  section 4.2:  it introduces
> void:uriSpace, but  distinctions in the use of
> void:uriRegexPattern\void:uriSpace  are not so clear. Why don't
> use always void:uriRegexPattern?

Because a lot of people don't know what a regular expression is. And a lot of people who think they know regular expressions still write them wrongly. void:uriSpace addresses the most common use case for void:uriRegexPattern in a simpler way that avoids these problems.

A number of additional comments were received and handled in the VoID issue tracker, these can be reviewed here:

http://code.google.com/p/void-impl/issues/list?can=1&q=SWIG%3Dfeedback

Thanks again,
Richard

Received on Wednesday, 16 February 2011 00:11:03 UTC