Radio station metadata use case from Steve Harris on 2005-04-12 (public-rdf-dawg@w3.org from April to June 2005)

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Tue, 12 Apr 2005 17:04:01 +0100
To: DAWG public list <public-rdf-dawg@w3.org>
Message-ID: <20050412160401.GA32249@login.ecs.soton.ac.uk>

Based on my experience of implementing the current editors WD and helping
build an application.

The local student radio station (http://www.surgeradio.co.uk) uses RDF to
describe its playlists, handle requests and so on. They use the
Musicbrainz RDF (split into files per artist and disk to make applying
updates more efficient) to talk about released CDs, and some locally
created data (split same way) to talk about white label singles thay have
received through the post.

If/when the white label stuff gets released they send it to musicbrainz
and remove the local copy.

All the data is "trusted" and so exitsts in the background graph, but its
also kept in named graphs to allow provenential queries (MB v's local, who
wrote it etc.) to be answered.

My first thought about how to handle this case was to flag the graphs as
being in the background/named/both graph sets which allows me to store
this efficiently, but it makes queries too expensive, and in my currentl
implementation at least bNodes get shared between the background and named
graphs, which only matters in corner cases, but does change the
smenantics.

My final implementation was a naive implementation of whats in the spec,
as I understand it. I used a distinguished graph ideentified to
distinguish things in the background graph. I think assertion performance
is bad, but I've not worked on it.

However, using this implementation I then couldn't remove subsets of the
background graph (eg. locally created graphs that are now redundant). The
named part of the data can be removed easily, by using its graph
idetenifier, but all triples in the background graph cant be distinguished
in my implementation.

I would be possible to subidentify the triples in the background graph in
som way, but that identification can't be discovered from SPARQL which
would make extending it to be INSERT/UPDATE in the future painful, and
would complicate the data storge.

Another option I considered was to keep a copy of the graph as asserted,
and remove it when reqested, but it gets a bit complicated as I have to
keep a count on the numer of times any particular statement has been
asserted in the background graph, and I'm concerned about synchronisation
issues.

The design I posted earlier
(http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JanMar/0440.html)
turns out not to have this problem (though that wasn't what motivated the
design). As all graphs are named the application can do management on
data about individual disks in the background graph.

- Steve

Received on Tuesday, 12 April 2005 16:04:05 UTC