Re: Dataset metadata best practices

Hi Hugh, All,

In regard to Hugh’s final point, wouldn’t the use of named graphs in the store to separate the dataset from the metadata provide sufficient separation. The issue regarding the number of triples would then be satisfied by stating the number of triples in the dataset graph, or am I missing something?

Alasdair

> On 15 Mar 2017, at 15:00, Hugh Glaser <hugh@glasers.org> wrote:
> 
> Good question, Alan.
> 
> I would consider best practice to make the metadata available as both a file and in the endpoint, if there is one.
> That is what we do at the rkbexplorer.com sites, such as http://ibm.rkbexplorer.com/
> (We also make the semantic sitemap available, but I am guessing no-one reads that any more!)
> 
> However, I can see that some people might be uncomfortable with putting the void data into the store, since it has different provenance and possibly licence and other attributes.
> That is, for example, if there is a restricted licence on the endpoint, then the void metadata would only get that licence, whereas it is likely that a publisher would want a very liberal licence on the void data, since it drives traffic to the dataset.
> Putting it at a separate place (as well?), allows that distinction.
> 
> There is also a difficulty in the process - for example, if the void description includes the number of triples, then does the number of triples include the void description triples? If so, then calculating the accurate number of triples for the final store (once the void description has been loaded) becomes more interesting :-)
> I would never let anything like that worry me, though :-)
> 
> For those who haven't seen it, I can remind you that we have a store at
> http://void.rkbexplorer.com
> which, although rather old, does have quite a lot of void document in it.
> 
> Best
> Hugh
> 
>> On 15 Mar 2017, at 14:23, Alan Meehan <meehanal@scss.tcd.ie> wrote:
>> 
>> Hi all,
>> 
>> From reading a bit into dataset meta-data publication and looking at some datasets on Datahub, a "best practice" if you will, is to publish meta-data (according to a specific vocabulary - void, dcat, dataID etc.) as a seperate file along with a dataset.
>> 
>> I have noticed (and I could be wrong) that few datasets also directly include this meta-data in say the dump file of the dataset provided. I realise that if I want to query the dataset plus the meta-data that I could just download and load the dataset set dump and meta-data file into a local triple store. However, what If I want to query data from a public sparql endpint and this meta-data is not there? For example, the DBpedia 2016-04 release includes a DataID meta-data file on its download page, but this meta data is not in the public sparql endpoint.
>> 
>> So I am just wondering from the LOD community - is there a relucance to publish meta-data in this way or any disadvantage to doing so?
>> 
>> Thank you,
>> Alan
>> 
>> 
> 
> 

Alasdair J G Gray
Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33 <http://www.macs.hw.ac.uk/~ajg33>
ORCID: http://orcid.org/0000-0002-5711-4872 <http://orcid.org/0000-0002-5711-4872>
Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair

Received on Wednesday, 5 April 2017 15:08:23 UTC