Re: [BP - MET] - Best Practices - Guidance on the Provision of Metadata from Bernadette Farias Lóscio on 2014-05-15 (public-dwbp-wg@w3.org from May 2014)

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Thu, 15 May 2014 15:53:55 -0300
To: Makx Dekkers <mail@makxdekkers.com>
Cc: Laufer <laufer@globo.com>, Carlos Iglesias <carlos.iglesias.moro@gmail.com>, DWBP Public List <public-dwbp-wg@w3.org>
Message-ID: <CANx1PzwtsXXjei14J7kg+_F9pbqrwCv0o2vrOzFt8Dg3b=Zm5Q@mail.gmail.com>
Hello all,

I fully agree with Makx that we should start with DCAT [1]!

I also think that it is really important that we understand the concepts of
dataset and dataset distribution, because this will help to clarify what
kind of metadata should be provided.

Considering that a dataset is a 'virtual' collection of data, which may
have one or more distributions (ex: csv, json, rdf), then metadata should
describe:

i) the dataset: metadata that describes the dataset as a whole and that are
independent from distribution, for example information about data
provenance. Existing vocabularies, like PROV-O may also be used to describe
datasets.

ii) the distributions: metadata that provides information about a given
distribution, for example information about the release schedule.

 iii) the data itself:  this kind of metadata should provide information
about the data itself, i.e. it should provide semantics and meaning to the
data. Existing domain vocabularies (ex: ontologies) may be used to describe
the data (ex: DBPedia ontology, AKT, ..).

In this case, we should identify what kind of metadata is missing on DCAT
and that should be provided when publishing a dataset or releasing a
distribution.

As we already know, it is missing metadata to describe Data Quality, Data
Usage and Data Provenance. However, for Data Provenance we may consider to
use PROV-O.

kind regards,
Bernadette

[1] http://www.w3.org/TR/vocab-dcat/#vocabulary-overview



2014-05-15 15:02 GMT-03:00 Makx Dekkers <mail@makxdekkers.com>:

> Laufer,
>
>
>
> Could we maybe start from DCAT http://www.w3.org/TR/vocab-dcat/? That W3C
> Recommendation was specifically designed to describe data on the Web. It
> defines a metadata language that includes some of the metadata types you
> list in your message. It also distinguishes between the conceptual
> characteristics of the data (things you would use for search) and the
> actual, downloadable distribution of the data.
>
>
>
> There is also a DCAT application profile for data portals in Europe (
> https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-application-profile-data-portals-europe-final#download-links)
> that gives additional rules and constraints for the use of DCAT in a
> network of data portals in Europe. One thing that the DCAT-AP defines is
> the minimum set of metadata elements to be provided, actually only a name
> (dct:title) and a description (dct:description) of the data set, and a URL
> for its distribution (dcat:accessURL). A small number of elements are
> recommended if available.
>
>
>
> Could we do something similar?
>
>
>
> Makx.
>
>
>
>
>
> *From:* Laufer [mailto:laufer@globo.com]
> *Sent:* Thursday, May 15, 2014 4:36 PM
> *To:* Bernadette Farias Loscio; Carlos Iglesias; Makx Dekkers; DWBP
> Public List
> *Subject:* [BP - MET] - Best Practices - Guidance on the Provision of
> Metadata
>
>
>
> Hi Bernadette, Carlos, Makx, all DWBP members,
>
>
>
> I created a page on the wiki, "Best Practices – Guidance on the Provision
> of Metadata", where we can put the information about this topic. I took the
> liberty to define a prefix in the subject of the e-mails related to these
> discussions: [BP- MET].
>
>
>
> I would like to expose some thoughts that I think are related to the data
> on the web ecosystem. I see a kind of data architecture that has three big
> roles: a data Publisher, a data Consumer and a data Broker. The Broker is
> the one that has information that can be used by the Consumer to find data
> published by the Publisher.
>
>
>
> As an example of Brokers we can think about implementations of CKAN, used
> by data.gov, dados.gov.br, etc. CKAN has metadata (provided by
> Publishers) that are useful for Consumers to find data. CKAN is a registry
> and can also be a repository for the data to be consumed. Almost all use
> cases of DWBP WG are examples of Brokers.
>
>
>
> At the same time, data published in CKAN implementations can have multiple
> formats, as CSV, for example. Once a Consumer chooses some data to use from
> a Publisher, she needs another kind of metadata to understand how to access
> the data and its semantics.
>
>
>
> I propose to create categories and types of metadata. I see two
> categories: metadata for search and metadata for use. Each of these
> categories would have types of metadata. For example:
>
>
>
> Metadata Types for Search
>
> Human Content Description (free text)
>
> Machine Content Description (vocabularies)
>
> Provenance
>
> License
>
> Revenue
>
> Credentials
>
> Quality / Metrics
>
> Release Schedule
>
> Data Format
>
> Data Access
>
>
>
> Metadata Types for Use
>
> URI Design Principles
>
> Machine Access to Data
>
> API specification
>
> Format Specification
>
>
>
> The Brokers itself have another kind of metadata about its own information.
>
>
>
> Maybe in the future a Consumer will search for data no more in these
> Brokers (with its catalogues) but they will use search engines that could
> obtain the metadata (both the search and the use) using its crawlers. But
> now, we have this heterogeneous world of data that is one of the
> characteristic of the web since its beginning.
>
>
>
> Contributions of all members of the DWBP WG will be appreciated.
>
>
>
> Best Regards,
>
> Laufer
>
>
> --
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>



-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------
Received on Thursday, 15 May 2014 18:54:44 UTC