- From: M. Scott Marshall <mscottmarshall@gmail.com>
- Date: Fri, 10 Jun 2011 14:50:48 +0200
- To: linkedlifedatapracticesnote@googlegroups.com, Claus Stie Kallesøe <clausstiekallesoe@gmail.com>, Philip.Ashworth@ucb.com, HCLS <public-semweb-lifesci@w3.org>
[With Claus's cautious permission I'm CC'ing HCLS. I think that these questions and the answers to follow are generally valuable.] Hi Claus, Thanks very much for your feedback. This is exactly the sort of feedback that will help us to write a truly useful W3C note. Just to be clear from the start: Your questions touch on implementation issues that weren't quite yet in scope for the note. I think that it is worth considering to what extent we include those topics: user interfaces built on SPARQL and federation of SPARQL endpoints. > I am happy to help on the W3C note, but if its just going to be copy/paste > from a paper that I didn't write I am not sure I can add so much value? > But the reason I didn't participate on the paper was that I didn't feel I > had the knowledge to write a best practices paper on RDF. Since then I have > done some work, and used the paper, so at least I have some feedback on the > paper and how it is to use for a beginner in the field of actually doing > something and the issues I am having: You're not mentioning the birth of a new child that happened around that time. :) User requirements from domain experts such as yourself are a valuable and essential contribution! And besides: from your questions, I see that you have accrued a significant amount of experience. Again, thanks for sharing your observations with us. > I think the Figure 1 is good as it gives a good overview over the steps one > need to go through. Yes, the current plan is (still) to make the steps associated with Figure 1 the core of the W3C note. Would one of the doc editors please add that material to the Google Doc at the link below? https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-0e0ixh21_U0y0/edit?hl=en_US > But I still have some missing links in my understanding > in how to actually get from relational data (which I have a lot of) to a web > front end that takes input from a user, converts the input to a sparql > query, performs the query across multiple datasources and displays the > results in a nice format for the user. We haven't included material on user interface building, i.e. converting input to a SPARQL query and displaying results in a format that is nice for the user. It is indeed a confounding factor for users/developers wanting to use SPARQL. Perhaps we could mention a few possible approaches. Otherwise, we should declare it out of scope, if it seems too ambitious. In general, we've been trying to deal with the questions of setting up the SPARQL access to data that would otherwise require an additional API to integrate into a given application. Here are a few related ideas: Input -> SPARQL query Mapping the string labels used in a GUI to the identifiers used in a SPARQL query can be a matter of using rdfs:labels directly from the RDF. However, converting "input" to a SPARQL query is not always straightforward and I expect will remain an area of research for some years. Example: Natural language input that is mapped to the best SPARQL query using Bayesian probability. However, there are situations where there is a more straightforward mapping to the query, such as in faceted browsing. SPARQL query results -> formatted for end user consumption There are a few nice approaches to this. I think first of spatialization, which I recommended for HCLS KB demo query results. In work with the HCLS KB, Alan Ruttenberg used a Google Maps coordinate API to map search results to images. That approach was taken into use wholesale by BIRN. Spatialization works well when you have an attribute that can be mapped to a coordinate space. See also the SIMILE demonstrations (2007?) of mapping to zipcodes on geographical maps. I've always said that some people in visualization get their coordinate system for "free". ;) However, when you don't have a spatialization other than what a PCA will give you, another approach is to provide a list with facets/attributes of interest, such as disease, genes, pathways, etc. Such as was done in a faceted browser called slash facet ( "/facet") in 2007 with museum data. I will let others fill in relevant examples here. > So I have used D2R to map two of our inhouse datasources. Easy to use, gives > a good start, front end on D2R server on the mapped data gives an idea about > what it looks like so one can perform some edits manually by hand. Easily > ends up being a 1:1 mapping table:class and that might not be the right > thing. Sort of keeps you in the relational world while trying to go > semantic. Perhaps somebody can provide some tips to create better mappings from D2R? > SO still need to work on this part to map the right concepts > But at least I have two sparql end points via D2R on top of two of our > oracle databases. > Next step? Well I can write sparql againt each one of them - but then I > might as well just use sql I think? And you refer to SWObjects as a hacker's tool? ;) > So I would like to link them so I can > somwhow do a federated query across both sources at the same time. Make > sense right? > I have use SWobjects to do that and it works. But that to me is more a > hacker tool. Maybe not the right way to go if one wants a stable, scalable > solution where we can send all kinds of sparql queries? I'm curious if you've followed the tutorial here: http://tinyurl.com/swobjects-swat4ls , well, actually here: http://www.w3.org/2010/Talks/1208-egp-swobjects/ [Note to Eric - you don't link out to the tutorial from the wiki yet!] http://sourceforge.net/apps/mediawiki/swobjects/index.php?title=Main_Page SWObjects doesn't have all the bells and whistles of D2R and thus requires thorough knowledge of the target queries (in either SPARQL or SQL) as well as the desired mapping - so you have to decide on the desired semantics in one go. This makes it much more complicated to use than D2R (this is probably what you mean by hacker's tool). So, for a mapping to a relational database, you must know: your desired target SQL query and how you want it to look in SPARQL in order to create the SPARQL Construct(s). However, you've already pointed out the problem with automatic map generation above: you end up with 1:1 mapping table:class, with no semantics, where you've essentially postponed the problem of the above mapping choice. One nice thing about SWObjects is that once you've expressed your mapping rules as SPARQL Constructs, the query federation is automatically done for you, with your SPARQL query being decomposed, mapped and dispatched to the appropriate GRAPH services. Scalability: I consider scalability to refer to federation, in which case SWObjects is nicely scalable. The engine is written in C++, so it should be fast (with hopefully no memory leaks!). Of course, feature requests should come out of new work with SWObjects. If we refer to scaling up as the process of setting up a federation across more than a handful of endpoints, that could be an issue. I would like to see an DBVisualizer style interface built that can generate the SPARQL Constructs more easily than the current approach demanding manual SPARQL writing. I think that such an interface would make SWObjects a lot more useful. Stability: If it has crashed, would you please issue a bug report to the SWObjects mailing list? Otherwise, the biggest gap that we are attempting to deal with at the moment is Oracle drivers. Eric doesn't have the bandwidth to write the drivers himself and we are still hoping that Oracle will help us write the drivers in order to make SWObjects a viable choice for some of their interested clients (ongoing..). I believe that another point for improvement that has been noted is better integration with Apache instead of the current http service, thrown together in a few hours. Volunteers? > My mate Phil from UCB has mapped their internal data sources via D2R mapping > and then done some integration work by making a dataset ontology via VoID > (linksets) and a concept ontology via SKOS (narrowmatch between general > concepts and the classes in the different sources). > In order to do the same I first need to find an ontology tool, understand > VoID and SKOS and udnerstand how to use these things correctly together. > That is not a quick and simple thing - at least for me! A lot of > questions/unknowns here for beginners. > But I tried and have (maybe) an ontology that describes some Lundbeck > concepts and how they are linked to classes in some of our datasources. Then > what? I am CC'ing Philip Ashworth so that he can answer you once he's rested up from SemTech in San Francisco, California, where he presented his approach to federation. > Then I need a tool that can use my new "linking ontoloty" to create a > common/federated sparql end point so my web app can go there to ask > questions, right? What tool would that be? > Think Phil from UCB uses a tool from topbraid that I do not have yet. So > maybe its easy and astraight forward if I get that? I understood from Phil Brooks's tutorial at the EBI SemWeb Industry workshop that D2R is nicely integrated into TopBraid Composer although I haven't tried it myself. Yes, TopBraid is widely touted and they have a free version by the way. > I then discovery Silk and was thinking that maybe that would be able to help > me link my two data sources? When I read about silk it seems like thats what > it can do? But never got started until Anja et al launched their workbench. > So I now have a linkdescription made by/in Silk workbench. Fairly easy and > straight forward to do. ANd then what? Looks like a good question for Anja. > As I asked yesterday at the call and linked to the above situation: I now > have a link description that knows about my datasources and how/where they > link. Now I again need some tool that can use that to display a sparql end > point for me to point my searches. Or am I completely off here? My thinking > is that with a nice link description like that "some tool" must be able to > find data in the right places - if not what is the point of Silk? I'll leave that for Anja as well. I should mention though, that if you've created the SPARQL Construct mappings for SWObjects, you *started* with the knowledge of where your query would be answered. You actually include named graph references (GRAPH services) in the SPARQL Construct (again: see tutorial). Once those mappings are created, you issue your SPARQL query as if all the data were in one place and using your own terms/URIs. BTW, yet another approach to automatic resource discovery worth considering (not covered in the emerging practices paper) is SADI / SHARE. > So thats where I am now. I am sorry if I have offended anyone on the way - > that surely isn't my intension. I just wnated to show you, the academic > experts, what a relational centric pharma informatics person go through in > order to get going with the semantic technologies and linked data. ANd I > hope it could be of use when we right a W3C note that should help others > getting started? Very useful! No offence taken by anyone for honest questions, I'm sure. I hope that we can help to get you back on track shortly as well as pave the way for the next person. Cheers, Scott > On 8 June 2011 16:32, M. Scott Marshall <mscottmarshall@gmail.com> wrote: >> >> I haven't received an answer yet but the excerpts from the copyright >> URL below state fairly clearly that we are within our rights to use >> the same material in the W3C note. >> >> -Scott >> >> ---------- Forwarded message ---------- >> From: M. Scott Marshall <mscottmarshall@gmail.com> >> Date: Wed, Jun 8, 2011 at 10:56 AM >> Subject: Re: Submitted "Emerging best practices for mapping life >> sciences data to RDF - a case series" >> To: k.s.schlobach@vu.nl >> >> >> Dear Stefan, >> >> As part of the current HCLS charter, we plan to create a W3C note on >> the same topic as the submitted article in the next few months. It >> will be a 'derived work', based on overlapping material. I looked at >> the journal policies about such things and it seems to be allowed. >> >> >> From http://www.elsevier.com/wps/find/authorshome.authors/copyright#rights : >> >> * the right to post a pre-print version of the journal article on >> Internet web sites including electronic pre-print servers, and to >> retain indefinitely such version on such servers or sites for >> scholarly purposes* (with some exceptions such as The Lancet and Cell >> Press. See also our information on electronic preprints for a more >> detailed discussion on these points)*; >> >> * the right to prepare other derivative works, to extend the journal >> article into book-length form, or to otherwise re-use portions or >> excerpts in other works, with full acknowledgement of its original >> publication in the journal. >> >> Please let me know if you think this would pose a problem. My >> expectation is that the W3C note will be accessed by a different >> audience and, although relatively obscure, could act as an >> advertisement for the journal article (whose formatting would appeal >> to more readers) if we refer to the anticipated publication in the W3C >> note. >> >> -Scott >>
Received on Friday, 10 June 2011 12:51:16 UTC