Re: Owning URIs (Was: Yet Another LOD cloud browser) from Daniel Schwabe on 2009-06-02 (semantic-web@w3.org from June 2009)

From: Daniel Schwabe <dschwabe@inf.puc-rio.br>
Date: Tue, 02 Jun 2009 12:33:41 -0300
To: Sherman Monroe <sdmonroe@gmail.com>
CC: Kingsley Idehen <kidehen@openlinksw.com>, Samur Araujo <samuraraujo@gmail.com>, Linked Data community <public-lod@w3.org>, semantic-web@w3.org
Message-ID: <4A254655.70602@inf.puc-rio.br>

Sherman Monroe wrote:
> Daniel,
>
> I see some interesting concepts worth exploring here, e.g. using 
> windows (with paging inside the window). But as I refine my query, 
> there isn't any apparent context that orients me in the data. E.g. how 
> does one box/set relate to the others.
The dependency between the boxes is recorded, but it is not a simple 
matter to actually expose it in simple way to the user. Each box (set) 
is really dependent on a chain of previous operations, so in general it 
may be a very long list of function compositions.
I think the biggest contribution is not so much the interface aspects 
that you refer to, but the way you can form the various sets (boxes) 
through various operations - the SPO, which allows you to do arbitrary 
matches for <s,p,o> triples, plus union/intersection/difference, plus 
de-referencing, plus faceted interface on either an arbitrary set of 
chosen properties (and applied to any set) or automatically generated 
facets.
Here is a simple interesting scenario:
 Find a drug for hypoglycemia that can be prescribed to a known alcohol 
abuser.

Click on menu->repositories, add drugbank sparql endpoint 
(http://www4.wiwiss.fu-berlin.de/drugbank/sparql) limit 50 (sometimes 
we've been getting timeouts; just try again and eventually it works. We 
have a locally loaded version of these repositories, but we haven't 
finished building the index for the full text search yet, still figuring 
how to build this index it in Virtuoso).

search for hypoglycemic (call it Set A)
search for avoid alcohol (call it Set B)
click on A, clic on the intersection symbol, click on set B, click on 
"=". (call it set C).
Click on A, click on S, click on "-".
You've computed the set of drugs associated with hypoglycemic, 
intersected with the set of drugs which should not be taken with 
alcohol, and computed the difference between this set and the set  of 
drugs associated with hypoglycemic, resulting in such drugs that may be 
taken with alcohol.

If you sophisticate the scenario a bit, you can repeat the same 
reasoning for "antidepressant", to get the set of drugs which are 
antidepressants and may be taken with alcohol.
Sophisticating further (but here I don't have the medical knowledge to 
formulate it properly), I could try to determine which diabetes and 
antidepressant drugs could be prescribed together (I'd need to determine 
dangerous interactions between candidates obtained in the previous steps).
and so on...
>
> I notice you're using Sesame, do you think it can scale? I tried 
> selecting several repositories at once, but the system seems to hang 
> awhile (couple of minutes)  before returning results.
We use both Sesame (through its Java interface) and Virtuoso (regular 
http SPARQL interface), depending on the size of the datased (e.g., 
dbpedia is on Virtuoso). You may have also realized you can add any 
arbitrary external endpoint as well.
The problems you report are not really due to Explorator, but rather 
from the engines themselves, and the particular repositories. If you try 
to issue the same queries (notice there are many queries necessary to 
present the information in the form it appears on the screen), you will 
see they also take a while to respond. In fact, we'd be very interested 
in seeing how to optimize such queries. Samur, my former student, will 
elaborate this in a separate message, for those interested.
(we might take this offline if it becomes too specific, although I feel 
the problems we face are the same anyone who wishes to build "user 
friendly" interfaces to RDF data would face...)

Cheers
D

Received on Tuesday, 2 June 2009 15:34:17 UTC