Re: Radio station metadata use case from Seaborne, Andy on 2005-04-18 (public-rdf-dawg@w3.org from April to June 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 18 Apr 2005 18:37:11 +0100
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
CC: DAWG public list <public-rdf-dawg@w3.org>
Message-ID: <4263F047.6060406@hp.com>
Steve Harris wrote:
> Based on my experience of implementing the current editors WD and helping
> build an application.
> 
> The local student radio station (http://www.surgeradio.co.uk) uses RDF to
> describe its playlists, handle requests and so on. They use the
> Musicbrainz RDF (split into files per artist and disk to make applying
> updates more efficient) to talk about released CDs, and some locally
> created data (split same way) to talk about white label singles thay have
> received through the post.
> 
> If/when the white label stuff gets released they send it to musicbrainz
> and remove the local copy.
> 
> All the data is "trusted" and so exitsts in the background graph, but its
> also kept in named graphs to allow provenential queries (MB v's local, who
> wrote it etc.) to be answered.
> 
> My first thought about how to handle this case was to flag the graphs as
> being in the background/named/both graph sets which allows me to store
> this efficiently, but it makes queries too expensive, and in my currentl
> implementation at least bNodes get shared between the background and named
> graphs, which only matters in corner cases, but does change the
> smenantics.

Steve - I don't see what makes the query any more expensive.  Why does sorting 
quads (with or without a trusted flag) make this mapping fail:

{ ?x ?y ?z } =>             (?x ?y ?z *)     with * for any
GRAPH <u> { ?x ?y ?z } =>   (?x ?y ?z <u>)

The bNodes is a free choice.  RDF does not say whether they are same or diferent 
across graphs.  More on this below.

> 
> My final implementation was a naive implementation of whats in the spec,
> as I understand it. I used a distinguished graph ideentified to
> distinguish things in the background graph. I think assertion performance
> is bad, but I've not worked on it.
> 
> However, using this implementation I then couldn't remove subsets of the
> background graph (eg. locally created graphs that are now redundant).

If this (removing named graphs affecting the backgroudn graph) is a requirement, 
then the background graph must share bNodes (or keep a mapping) with the named 
graphs surely?  This seems to be true regardless of which scheme we are considering.


> The
> named part of the data can be removed easily, by using its graph
> idetenifier, but all triples in the background graph cant be distinguished
> in my implementation.

Interesting - so if the schema sparates teh concerns for data management from 
the concerns for query then storing 5-tuples (you'd want to normalize as well):

   (<s>  <p>  <o>  URI-or-null   original-named-graph)

and doing datamanagement based on slot 5, and query based on slot 4 might work. 
  BNodes decision permitting.

To separate bNodes, then insert a new 5-slot "triple" keeping the 
original-named-graph indicator so it can be mass-removed.

> 
> I would be possible to subidentify the triples in the background graph in
> som way, but that identification can't be discovered from SPARQL which
> would make extending it to be INSERT/UPDATE in the future painful, and
> would complicate the data storge.

Seems to me that data management and presented information aren't necessary 
identifical so using the same information is likely to lead to trouble 
somewhere.  This makes INSERT/UPDATE orthogonal to query.

> 
> Another option I considered was to keep a copy of the graph as asserted,
> and remove it when reqested, but it gets a bit complicated as I have to
> keep a count on the numer of times any particular statement has been
> asserted in the background graph, and I'm concerned about synchronisation
> issues.
> 
> The design I posted earlier
> (http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JanMar/0440.html)
> turns out not to have this problem (though that wasn't what motivated the
> design). As all graphs are named the application can do management on
> data about individual disks in the background graph.

I'm not clear anymore on this - is the distinguished named graph the RDF merge 
of some other graphs or not?  This seems to say it is not a copy so, with shared 
bNodes, it is the same as your first thought except there the distinguished 
graph has a hidden name (not visible to the query).

	Andy

> 
> - Steve
>
Received on Monday, 18 April 2005 17:37:55 UTC