Primer - quick notes and questions from Harry Halpin on 2006-11-02 (public-grddl-wg@w3.org from November 2006)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Thu, 02 Nov 2006 18:43:17 +0000
To: public-grddl-wg <public-grddl-wg@w3.org>
Message-ID: <454A3C45.9080502@ibiblio.org>
I spent the last hour or so rewriting the primer (although I haven't
checked this in due to the problems I've encountered so far) so that
users can download and run the examples.

So, currently the examples talk about booking a date together.

First, I've noticed the embeddedRDF transform uses a different namespace
that glean-hcal.XSL.

RDF produced by glen-hcal.xsl uses:
http://www.w3.org/2002/12/cal/icaltzd#

But embeddedRDF produces:
http://www.w3.org/2002/12/cal#

So we need to fix it so they use the same namespace, which I'm assuming
is the slightly more cryptic:
http://www.w3.org/2002/12/cal/icaltzd#

Second, finding out what dates they have in common is not trivial.  If
we had 3 users, each with their calendars converted to RDF and merged,
what SPARQL
query would we use:

I'm no SPARQL expert, but I think it would be something like:

  PREFIX c: <http://www.w3.org/2002/12/cal/icaltzd#>
   SELECT ?name, ?summary, ?when
   FROM <http://www.example.org/exampleData>
   WHERE { ?event c:summary ?summary;
                  c:location ?location.FILTER(??)
                  c:dtstart ?ymd.FILTER(???);
         }.

Notice the ??. I'd like to keep this simple if at all possible - what's
the way of FILTERing a variable (i.e the date and location) so the
result only returns those dates and locations that have the same object.
In other words,  in the result graph the dates are the same and the
locations are the same, i.e. the triple is repeated I'm pretty sure this
could be done by a series of SPARQL queries, i.e. one that selects all
the locations, and then queries on each of the locations to see if that
location is repeated...

There's two tricky things also in this simple date booking example.
First, we just merged the calendar data into one "super-calendar" and
lost all the author information, i.e. whether it was Jane, Robin, and
David that were at a particular event. Ideally, our query would return
triples with the same date/location but with different people.

 Should we extend the primer to deal with this, which means
manufacturing vCard RDFfor Jane, Robin, and David, and then merging
vCard with RDFCal data and changing the query to also return "who" will
be at the event and making sure we return an event "all three" of them
will be at.

Second, right now the query is only matching on DTSTART and is assuming
everyone is at the same event. But what if Robin is in LA at one event
Nov 6-9th and David in LA at another event Nov 3-7th. (which the current
primer has). We want to match Nov 7-6 as the days they can meet. Now the
SPARQL has become even more complicated.

Any pointers?

P.S.: Also, it appears that all SPARQL and triplestore products seem to
require using a programming language, i.e. Jena in Java. Are there any
relatively easy-to-use commandline tools that can do SPARQL queries, or
will the tutorial users simply have to bite the bullet and install Jena?

I believe for graph merging we should use cwm, but then I'm tempted to
add even simple cwm commands in for non-RDF people to use ala "cwm
---rdf janeschedule.rdf robinschedule.rdf > combinedschedule.rdf." What
do people think?

Again, I would like a *complete* newbie to RDF to run this tutorial and
see how neat RDF and GRDDL are, which is hard if all the work is made
totally implicit and we don't provide running sample code.








-- 
		-harry

Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426
Received on Thursday, 2 November 2006 18:44:35 UTC