[linkedlifew3cnote] Re: Submitted "Emerging best practices for mapping life sciences data to RDF - a case series" from M. Scott Marshall on 2011-06-10 (public-semweb-lifesci@w3.org from June 2011)

From: M. Scott Marshall <mscottmarshall@gmail.com>
Date: Fri, 10 Jun 2011 17:05:44 +0200
To: linkedlifedatapracticesnote@googlegroups.com
Cc: Philip.Ashworth@ucb.com, HCLS <public-semweb-lifesci@w3.org>, "Eric Prud'hommeaux" <eric@w3.org>
Message-ID: <BANLkTim6P4PSALoxQ=KfDMUXA2NxVnnuVg@mail.gmail.com>

On Fri, Jun 10, 2011 at 3:50 PM, Claus Stie Kallesøe
<clausstiekallesoe@gmail.com> wrote:
>> Yes, the current plan is (still) to make the steps associated with
>> Figure 1 the core of the W3C note. Would one of the doc editors please
>> add that material to the Google Doc at the link below?

Just a note about the Google Doc to the curious - there isn't much to
look at yet. I would be happy to add people to the document who intend
to take part in the W3C note but please state if that is your
intention.

If you are interested in the submitted draft article we are referring
to at the moment (in review), please send me a note to that effect.
Also, please note that there are other recently published articles
from LODD such as this one: http://www.jcheminf.com/content/3/1/19 and
another that will be published soon (when Matthias?).

>> > SO still need to work on this part to map the right concepts
>> > But at least I have two sparql end points via D2R on top of two of our
>> > oracle databases.
>> > Next step? Well I can write sparql againt each one of them - but then I
>> > might as well just use sql I think?
>>
>> And you refer to SWObjects as a hacker's tool? ;)
>
> well, not in the sense that its hard to use. Maybe more in the sense that I
> am not sure that its ready to be run in full production with 500 users
> running sparqls across multiple datasources? seems more like a very nice
> tool to test/explore datasets

I don't know of any SPARQL query federation that has been taken into
full production with even 50 simultaneous users. If you know of such a
thing, I would be curious how it works. In any case, all systems have
their limits and the limits of scale often depend more on the
combination of query and data (and presence of certain types of
optimization) then on the software translating and sending the query
(SWObjects). As I'm sure you know, some types of queries will not
return results at interactive speed, even in a tuned relational
setting. So, I would expect the performance bottleneck to be at the
back end rather than the front end. Having said that, I wouldn't be
surprised if the http service that Eric cooked up in a few hours had a
hiccup or two while scaling up. But that is easily fixed by writing a
proper Apache service to do the job right, non?

>> [Note to Eric - you don't link out to the tutorial from the wiki yet!]
>> http://sourceforge.net/apps/mediawiki/swobjects/index.php?title=Main_Page
>>
>> SWObjects doesn't have all the bells and whistles of D2R and thus
>> requires thorough knowledge of the target queries (in either SPARQL or
>> SQL) as well as the desired mapping - so you have to decide on the
>> desired semantics in one go. This makes it much more complicated to
>> use than D2R (this is probably what you mean by hacker's tool). So,
>> for a mapping to a relational database, you must know: your desired
>> target SQL query and how you want it to look in SPARQL in order to
>> create the SPARQL Construct(s).
>
> And I aim for a GUI where scientist can answer all sorts of questions about
> our data (and experience shows that they do if possible!) and we then need
> to turn it into sparql and execute. So I dont know the query and needed
> construct before the user enters their question. Then I will build the
> sparql automatically
>
> So I would think that one needs to build a new federated graph based on the
> underlying data and hold that somewhere somehow

Just to be clear, I don't mean "constructing a SPARQL query".  What I
mean by SPARQL Construct and how it is used by SWObjects: A SPARQL
Construct is a way to express a rule or mapping but, in this case, it
is used by *SWObjects* not the end user. The SPARQL Construct emits a
graph segment in an RDF target language when the WHERE clause is
matched.  The graph segment is the thing being matched by end user
queries. Generally, you set up these mappings for the attributes and
values that are likely to be requested by end users. You don't have to
store a new federated graph - the federated graph view is created
on-the-fly by the SWObject's query rewriting.  You also don't need to
map all artifacts found in the table space of SQL. But you *do* need
to choose which parts of your tablespace you want to expose as RDF and
any necessary mappings or transforms.

The SWObjects approach effectively creates a federated view of the
mapped graphs. The view is created from the rules expressed in SPARQL
Construct.  Those rules can be "chained", i.e. the result of one rule
production can trigger another rule or mapping. The rule language is part
of SPARQL itself, which is a nice feature. Other mapping languages (SPIN?) could
eventually be supported with a little help from our friends.

Cheers,
Scott

-- 
M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls
http://staff.science.uva.nl/~marshall

Received on Friday, 10 June 2011 15:06:12 UTC