Re: Radio station metadata use case from Steve Harris on 2005-04-19 (public-rdf-dawg@w3.org from April to June 2005)

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Tue, 19 Apr 2005 10:36:10 +0100
To: DAWG public list <public-rdf-dawg@w3.org>
Message-ID: <20050419093610.GB9301@login.ecs.soton.ac.uk>
On Mon, Apr 18, 2005 at 06:37:11PM +0100, Andy Seaborne wrote:
> 
> 
> Steve Harris wrote:
> >Based on my experience of implementing the current editors WD and helping
> >build an application.
> >
> >The local student radio station (http://www.surgeradio.co.uk) uses RDF to
> >describe its playlists, handle requests and so on. They use the
> >Musicbrainz RDF (split into files per artist and disk to make applying
> >updates more efficient) to talk about released CDs, and some locally
> >created data (split same way) to talk about white label singles thay have
> >received through the post.
> >
> >If/when the white label stuff gets released they send it to musicbrainz
> >and remove the local copy.
> >
> >All the data is "trusted" and so exitsts in the background graph, but its
> >also kept in named graphs to allow provenential queries (MB v's local, who
> >wrote it etc.) to be answered.
> >
> >My first thought about how to handle this case was to flag the graphs as
> >being in the background/named/both graph sets which allows me to store
> >this efficiently, but it makes queries too expensive, and in my currentl
> >implementation at least bNodes get shared between the background and named
> >graphs, which only matters in corner cases, but does change the
> >smenantics.
> 
> Steve - I don't see what makes the query any more expensive.  Why does 
> sorting quads (with or without a trusted flag) make this mapping fail:
> 
> { ?x ?y ?z } =>             (?x ?y ?z *)     with * for any
> GRAPH <u> { ?x ?y ?z } =>   (?x ?y ?z <u>)

This is not an implementation of the specification - it does not allow
statements to exist in named gaphs but not the background graph.
 
> The bNodes is a free choice.  RDF does not say whether they are same or 
> diferent across graphs.  More on this below.

Uggh. That makes my head hurt. If it is a free choice in RDF it better not
be in SPARQL otherwise we have the potential for some really confusing
results when graphs are loaded into both the background and named graphs,
eg. in lists:

SELECT ?cdr
WHERE GRAPH <http://example.com/data.rdf> { :foo rdf:first ?car .  }
	?car rdf:rest ?cdr .

(with appologies if I've forgotten the rdf list syntax)
 
> >My final implementation was a naive implementation of whats in the spec,
> >as I understand it. I used a distinguished graph ideentified to
> >distinguish things in the background graph. I think assertion performance
> >is bad, but I've not worked on it.
> >
> >However, using this implementation I then couldn't remove subsets of the
> >background graph (eg. locally created graphs that are now redundant).
> 
> If this (removing named graphs affecting the backgroudn graph) is a 
> requirement, then the background graph must share bNodes (or keep a 
> mapping) with the named graphs surely?  This seems to be true regardless of 
> which scheme we are considering.

Yes, the choice is wether the graph that is used for default answering is
the same as the one that is used for GRAPH answering or not.
 
> >The
> >named part of the data can be removed easily, by using its graph
> >idetenifier, but all triples in the background graph cant be distinguished
> >in my implementation.
> 
> Interesting - so if the schema sparates teh concerns for data management 
> from the concerns for query then storing 5-tuples (you'd want to normalize 
> as well):
> 
>   (<s>  <p>  <o>  URI-or-null   original-named-graph)
> 
> and doing datamanagement based on slot 5, and query based on slot 4 might 
> work. BNodes decision permitting.
> 
> To separate bNodes, then insert a new 5-slot "triple" keeping the 
> original-named-graph indicator so it can be mass-removed.

This means we need to go beyon terms in SPARQL to do data mangement, which
I dont want to do.
 
> >I would be possible to subidentify the triples in the background graph in
> >som way, but that identification can't be discovered from SPARQL which
> >would make extending it to be INSERT/UPDATE in the future painful, and
> >would complicate the data storge.
> 
> Seems to me that data management and presented information aren't necessary 
> identifical so using the same information is likely to lead to trouble 
> somewhere.  This makes INSERT/UPDATE orthogonal to query.

The're not neccesarily identical, sure, but I would find it mighty
supprising if thier not.
 
> >Another option I considered was to keep a copy of the graph as asserted,
> >and remove it when reqested, but it gets a bit complicated as I have to
> >keep a count on the numer of times any particular statement has been
> >asserted in the background graph, and I'm concerned about synchronisation
> >issues.
> >
> >The design I posted earlier
> >(http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JanMar/0440.html)
> >turns out not to have this problem (though that wasn't what motivated the
> >design). As all graphs are named the application can do management on
> >data about individual disks in the background graph.
> 
> I'm not clear anymore on this - is the distinguished named graph the RDF 
> merge of some other graphs or not?  This seems to say it is not a copy so, 
> with shared bNodes, it is the same as your first thought except there the 
> distinguished graph has a hidden name (not visible to the query).

There is no distingushed graph per-se (there was in my rq23
implementation, but thats another issue). All there is a set of named
graphs, of which a sub-set are used to match triple patterns that dont
use the GRAPH keyword.

- Steve
Received on Tuesday, 19 April 2005 09:36:14 UTC