Re: Dataset metadata best practices

Good question, Alan.

I would consider best practice to make the metadata available as both a file and in the endpoint, if there is one.
That is what we do at the rkbexplorer.com sites, such as http://ibm.rkbexplorer.com/
(We also make the semantic sitemap available, but I am guessing no-one reads that any more!)

However, I can see that some people might be uncomfortable with putting the void data into the store, since it has different provenance and possibly licence and other attributes.
That is, for example, if there is a restricted licence on the endpoint, then the void metadata would only get that licence, whereas it is likely that a publisher would want a very liberal licence on the void data, since it drives traffic to the dataset.
Putting it at a separate place (as well?), allows that distinction.

There is also a difficulty in the process - for example, if the void description includes the number of triples, then does the number of triples include the void description triples? If so, then calculating the accurate number of triples for the final store (once the void description has been loaded) becomes more interesting :-)
I would never let anything like that worry me, though :-)

For those who haven't seen it, I can remind you that we have a store at
http://void.rkbexplorer.com
which, although rather old, does have quite a lot of void document in it.

Best
Hugh

> On 15 Mar 2017, at 14:23, Alan Meehan <meehanal@scss.tcd.ie> wrote:
> 
> Hi all,
> 
> From reading a bit into dataset meta-data publication and looking at some datasets on Datahub, a "best practice" if you will, is to publish meta-data (according to a specific vocabulary - void, dcat, dataID etc.) as a seperate file along with a dataset.
> 
> I have noticed (and I could be wrong) that few datasets also directly include this meta-data in say the dump file of the dataset provided. I realise that if I want to query the dataset plus the meta-data that I could just download and load the dataset set dump and meta-data file into a local triple store. However, what If I want to query data from a public sparql endpint and this meta-data is not there? For example, the DBpedia 2016-04 release includes a DataID meta-data file on its download page, but this meta data is not in the public sparql endpoint.
> 
> So I am just wondering from the LOD community - is there a relucance to publish meta-data in this way or any disadvantage to doing so?
> 
> Thank you,
> Alan
> 
> 

Received on Wednesday, 15 March 2017 15:00:48 UTC