Re: Dataset metadata best practices from Ruben Verborgh on 2017-04-05 (public-lod@w3.org from April 2017)

From: Ruben Verborgh <Ruben.Verborgh@UGent.be>
Date: Wed, 5 Apr 2017 15:39:43 +0000
To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
CC: Hugh Glaser <hugh@glasers.org>, Alan Meehan <meehanal@scss.tcd.ie>, "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <004C063B-ED6E-4498-B97F-76211C0B1254@ugent.be>

Dear all,

> wouldn’t the use of named graphs in the store to separate the dataset from the metadata provide sufficient separation

Named graphs are definitely the answer.

I think we all should use them much more,
maybe even stop "polluting" the default graph,
and publish everything—data and metadata—in different graphs.

Right now, we often still use identifiers to distinguish sources.
As in "this subject is the DBpedia URI of X, so the data comes from DBpedia",
whereas another dataset would use other URIs for X as a kind or provenance,
and then just owl:sameAs them later.
If we all publish in graphs, such practices can go away.

This blog post [1] elaborates on the argument
of putting metadata (and controls) in a separate graph,
with an explicit link (!) from the metadata to the data.

> The issue regarding the number of triples would then be satisfied by stating the number of triples in the dataset graph


We actively use this principle in the Triple Pattern Fragments spec [2],
where the separate graph is necessary for clients in order
to provide the correct answer to SELECT * { ?s ?p ?o } queries.
(Without separation, the answer would include the metadata.)

For an example, see
    curl -H "Accept: application/trig" "http://fragments.dbpedia.org/2016-04/en"

Best,

Ruben

[1] https://ruben.verborgh.org/blog/2015/10/06/turtles-all-the-way-down/#graphs-let-us-combine-data-context-and-controls-neatly
[2] https://www.hydra-cg.com/spec/latest/triple-pattern-fragments/

Received on Wednesday, 5 April 2017 15:40:21 UTC