- From: Geoff Chappell <gchappell@intellidimension.com>
- Date: Mon, 4 Jul 2005 20:26:08 -0400
- To: "'Eric Jain'" <Eric.Jain@isb-sib.ch>
- Cc: <public-semweb-lifesci@w3.org>
> -----Original Message----- > From: Eric Jain [mailto:Eric.Jain@isb-sib.ch] > Sent: Monday, July 04, 2005 12:33 PM > To: Geoff Chappell > Cc: public-semweb-lifesci@w3.org > Subject: Re: Uniprot RDF in RDF Gateway > > Geoff Chappell wrote: > > I've added an experimental sparql interface - details at: > > > > http://labs.intellidimension.com/uniprot/query2.rsp > > Great, quite impressive! > > Is it possible (and efficient) to use this system to retrieve large data > sets (thousands to millions of triples)? To some degree... query results and intermediate products are currently in-memory only (something we're addressing in a fall release) - so you're somewhat limited by the characteristics of your machine, the complexity of your query and rules, and the amount of data you have. That said, the scripting language give you some flexibility in retrieving massive datasets. For example, you could do something like this to obtain concise bounded descriptions of all human proteins (reasonably efficiently): use UNIPROT; import "/std/ns.rql"; import "/std/cbd.rql"; session.namespaces["uni"] = "urn:lsid:uniprot.org:ontology:"; rulebase trans{ infer {[rdfs:subClassOf] ?a ?c} from {[rdfs:subClassOf] ?a ?b} and {[rdfs:subClassOf] ?b ?c}; } var dsUni = datasource("uniprot"); var rsSub = (select ?c using uniprot rulebase trans where {[rdfs:subClassOf] [urn:lsid:uniprot.org:taxonomy:9606] ?c} or ?c=[urn:lsid:uniprot.org:taxonomy:9606]); for (;!rsSub.EOF; rsSub.moveNext()) { //for each superclass of human get a simple cursor //(just walks an index - no memory usage) var rs = dsUni.getCursor(resource("uni:organism"), null, rsSub[0]); for (;!rs.EOF;rs.moveNext()) { //get a concise bounded description for the resource //(includes reifications about resource) var ds = datasource((select ?p ?s ?o using uniprot rulebase cbd where description(?p ?s ?o #(rs[2])))); var s = ds.format("application/ntriples"); //append it to a file, write it out, etc. //... } } > Many people are interested in obtaining subsets of our data (e.g. only > human proteins), so that's another interesting use case. See above example. Of course, it'd probably make sense to set a query governor - e.g: Session.maxQueryComplexity = 5000000; and let the original query rip as one - good chance for this example it would be ok on a reasonable box. Best, Geoff
Received on Tuesday, 5 July 2005 01:05:11 UTC