Re: [BP - MET] - Best Practices - Guidance on the Provision of Metadata from Ivan Herman on 2014-05-16 (public-dwbp-wg@w3.org from May 2014)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 16 May 2014 11:16:36 +0200
To: Phil Archer <phila@w3.org>
Cc: DWBP Public List <public-dwbp-wg@w3.org>
Message-Id: <D602B520-EA45-4D25-82B8-6BEE4C93D09F@w3.org>
Hi Phil,

On 16 May 2014, at 10:37 , Phil Archer <phila@w3.org> wrote:

> Reading through this thread today I have a couple of comments.
> 
> I think what Laufer is doing is starting from the abstract position, expecting in future to transform that into practical advice. +1 to that. DCAT is the vocab for describing datasets in a catalogue (it's the Data CATalogue vocabulary) and it does that job fine - I don't think anyone's suggesting we reinvent it.
> 
> But there are several pieces missing from the landscape and our job is to guide people on how to use what pieces exist and, where necessary, fix those gaps.
> 
> Semantics of the dataset
> ========================
> This is indeed what the CSVW WG is working on for tabular data. And VoID does a similar job for Linked Data. So a link from a dcat:Distribution to a machine readable metadata about the semantics could well be useful. But I agree with Bernadette that that's as far as we should go, i.e. we just provide the hooks.
> 
> That said, we should be mindful that the CSVW work will include links from the data to the (semantic) metadata. VoID uses my least favourite method (a well known location) to achieve the same thing.

Just a minor correction: though this is not yet decided, my expectation is that the group will not define one mechanism but open the door to several. This may also include the well-known location approach.

Ivan

> 
> <#myDistro> a dcat:distribution;
>  dcat:semanticMetadata <http://example.com/myDistro-meta>.
> 
> doesn't prevent or conflict with any other link that may exist between the dataset and its metadata and could possibly be useful.
> 
> NB. Semantic metadata is going to be format-specific so I guess it has to be linked from each distribution, not from the (abstract) dcat:Dataset. WDYT?
> 
> Application profiles
> ====================
> We're trying to get a new WG up and running on this - i.e. a method to make things like the DCAT-AP machine readable. My colleague Eric Prud'hommeaux is working on this. If W3C member organizations represented in *this* WG would be interested in that work, please let me know - we're building the community for what we expect to be the RDF Data Shapes WG. The hope is that it will have its first f2f meeting at TPAC (so you could go to both f2f meetings in one trip :-) )
> 
> Data Quality
> ============
> Yep - that's what we're working on too.
> 
> CKAN
> ====
> The message from Open Knowledge Foundation is exactly what you'd expect from any open source project: you want new or improved features - create and improve them! Bernadette's students' work on building extensions for CKAN is a very important part of our work here. I hope that we can see real instances of CKAN with the extension installed. The same vocabulary extensions in non-CKAN portals is just as important (I'm nudging you Martin & Carlos ;-) )
> 
> 
> Best Practices
> ==============
> So... in terms of BP, I suggest we explain the high level needs - which I think was Laufer's starting point - and then dive into how to do it, pointing to whatever method is or will be available for doing so.
> 
> HTH
> 
> Phil.
> 
> 
> 
> On 16/05/2014 08:26, Ghislain Atemezing wrote:
>> Hi Laufer, all,
>> Thanks for this great starting discussion. Find below my 2 cents ...
>>> I created a page on the wiki, "Best Practices – Guidance on the
>>> Provision of Metadata", where we can put the information about this
>>> topic. I took the liberty to define a prefix in the subject of the
>>> e-mails related to these discussions: [BP- MET].
>>> 
>>> I would like to expose some thoughts that I think are related to the
>>> data on the web ecosystem. I see a kind of data architecture that has
>>> three big roles: a data Publisher, a data Consumer and a data Broker.
>>> The Broker is the one that has information that can be used by the
>>> Consumer to find data published by the Publisher.
>>> 
>>> As an example of Brokers we can think about implementations of CKAN,
>>> used by data.gov <http://data.gov>, dados.gov.br <http://dados.gov.br>,
>>> etc. CKAN has metadata (provided by Publishers) that are useful for
>>> Consumers to find data. CKAN is a registry and can also be a repository
>>> for the data to be consumed. Almost all use cases of DWBP WG are
>>> examples of Brokers.
>>> 
>>> At the same time, data published in CKAN implementations can have
>>> multiple formats, as CSV, for example. Once a Consumer chooses some data
>>> to use from a Publisher, she needs another kind of metadata to
>>> understand how to access the data and its semantics.
>>> 
>>> I propose to create categories and types of metadata. I see two
>>> categories: metadata for search and metadata for use. Each of these
>>> categories would have types of metadata. For example:
>>> 
>> +1. I could consider also metada "computed" based on some provenance
>> data + metrics. For e.g.: If a dataset is published by a "certified
>> organization" and it is reused by many users/applications, then it has
>> higher quality.
>>> Metadata Types for Search
>>> 
>>> Human Content Description (free text)
>> ..and categories/themes
>>> 
>>> Machine Content Description (vocabularies)
>>> 
>>> Provenance
>>> 
>>> License
>>> 
>>> Revenue
>>> 
>>> Credentials
>>> 
>>> Quality / Metrics
>>> 
>>> Release Schedule
>>> 
>>> Data Format
>>> 
>>> Data Access
>> +1 for all this first metadata types
>>> 
>>> Metadata Types for Use
>>> 
>>> URI Design Principles
>>> 
>>> Machine Access to Data
>>> 
>>> API specification
>>> 
>> I am not sure to understand the above types. Could you give us an
>> example why "vocabularies" are not in this list, but "URI design
>> principles" is here? One may think that there is no principles in
>> designing URIs for vocabs.
>>> Format Specification
>>> 
>> What's the difference between "format spec" and "data format"?
>> 
>> As others pointed out, we could define a small set of mandatory field
>> when providing the metadata.
>> 
>> Thanks again for taking care of this section.
>> 
>> Cheers,
>> Ghislain
>> 
> 
> -- 
> 
> 
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
> 
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
> 


----
Ivan Herman, W3C 
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me
Received on Friday, 16 May 2014 09:17:05 UTC