RE: [BP - MET] - Best Practices - Guidance on the Provision of Metadata

Laufer,

 

Could we maybe start from DCAT http://www.w3.org/TR/vocab-dcat/? That W3C Recommendation was specifically designed to describe data on the Web. It defines a metadata language that includes some of the metadata types you list in your message. It also distinguishes between the conceptual characteristics of the data (things you would use for search) and the actual, downloadable distribution of the data.

 

There is also a DCAT application profile for data portals in Europe (https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-application-profile-data-portals-europe-final#download-links) that gives additional rules and constraints for the use of DCAT in a network of data portals in Europe. One thing that the DCAT-AP defines is the minimum set of metadata elements to be provided, actually only a name (dct:title) and a description (dct:description) of the data set, and a URL for its distribution (dcat:accessURL). A small number of elements are recommended if available.

 

Could we do something similar?

 

Makx.

 

 

From: Laufer [mailto:laufer@globo.com] 
Sent: Thursday, May 15, 2014 4:36 PM
To: Bernadette Farias Loscio; Carlos Iglesias; Makx Dekkers; DWBP Public List
Subject: [BP - MET] - Best Practices - Guidance on the Provision of Metadata

 

Hi Bernadette, Carlos, Makx, all DWBP members,

 

I created a page on the wiki, "Best Practices – Guidance on the Provision of Metadata", where we can put the information about this topic. I took the liberty to define a prefix in the subject of the e-mails related to these discussions: [BP- MET].

 

I would like to expose some thoughts that I think are related to the data on the web ecosystem. I see a kind of data architecture that has three big roles: a data Publisher, a data Consumer and a data Broker. The Broker is the one that has information that can be used by the Consumer to find data published by the Publisher.

 

As an example of Brokers we can think about implementations of CKAN, used by data.gov <http://data.gov> , dados.gov.br <http://dados.gov.br> , etc. CKAN has metadata (provided by Publishers) that are useful for Consumers to find data. CKAN is a registry and can also be a repository for the data to be consumed. Almost all use cases of DWBP WG are examples of Brokers.

 

At the same time, data published in CKAN implementations can have multiple formats, as CSV, for example. Once a Consumer chooses some data to use from a Publisher, she needs another kind of metadata to understand how to access the data and its semantics.

 

I propose to create categories and types of metadata. I see two categories: metadata for search and metadata for use. Each of these categories would have types of metadata. For example:

 

Metadata Types for Search

Human Content Description (free text)

Machine Content Description (vocabularies)

Provenance

License

Revenue

Credentials

Quality / Metrics

Release Schedule

Data Format

Data Access

 

Metadata Types for Use

URI Design Principles

Machine Access to Data

API specification

Format Specification

 

The Brokers itself have another kind of metadata about its own information.

 

Maybe in the future a Consumer will search for data no more in these Brokers (with its catalogues) but they will use search engines that could obtain the metadata (both the search and the use) using its crawlers. But now, we have this heterogeneous world of data that is one of the characteristic of the web since its beginning.

 

Contributions of all members of the DWBP WG will be appreciated.

 

Best Regards,

Laufer


-- 
.  .  .  .. .  . 
.        .   . ..
.     ..       . 

Received on Thursday, 15 May 2014 18:03:02 UTC