- From: Simon Spero <sesuncedu@gmail.com>
- Date: Sat, 19 Jul 2014 15:52:20 -0400
- To: "Souza, Renan F. S." <renan123@missouristate.edu>
- Cc: Luca Matteis <lmatteis@gmail.com>, "semantic-web@w3.org Web" <semantic-web@w3.org>
- Message-ID: <CADE8KM79JzcHUEJwpak9_MBqiLxuxBm+fcXBgiUFDCwbOGHT_w@mail.gmail.com>
It's not entirely clear whether the system that is hitting resource limits is the server or the client. If the result set which to which "ORDER BY" will be applied is too big, the resource constraints will be hit before the "OFFSET" can be applied. The server may already be using external storage for the temporary result set. If OFFSET and LIMIT are used for paging, it *may* be possible to use SPARQL UPDATE to create a temporary graph whose results do not need an ORDER BY across multiple calls; however, whether or not this gives useful results is implementation dependent <http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#modOffset>. Since most engines provide an implementation specific way to dump a named graph to disk, there may be no need to use SPARQL query to fetch the results; however, it is important to remove the graph once it has been processed (just like when using non "TEMPORARY" temporary tables in SQL). If the resource limits are being exceeded on the client side, it is easy to use an API that processes results as they are produced; for example Sesame queries generally provide a pair of methods, one of which builds a complete result object, the other taking a handler object that will be called for each set of bindings or graph. For example: SPARQLTupleQuery::evaluate <http://openrdf.callimachus.net/sesame/2.7/apidocs/org/openrdf/repository/sparql/query/SPARQLTupleQuery.html#evaluate(org.openrdf.query.TupleQueryResultHandler)> can be passed an instance of SPARQLResultsJSONWriter <http://openrdf.callimachus.net/sesame/2.7/apidocs/org/openrdf/query/resultio/sparqljson/SPARQLResultsJSONWriter.html> . and SPARQLGraphQuery::evaluate <http://openrdf.callimachus.net/sesame/2.7/apidocs/org/openrdf/repository/sparql/query/SPARQLGraphQuery.html#evaluate(org.openrdf.rio.RDFHandler)>can be passed an instance of NTriplesWriter <http://openrdf.callimachus.net/sesame/2.7/apidocs/org/openrdf/rio/ntriples/NTriplesWriter.html> . Similarly, Jena provides QueryExecution::execConstructTriples <http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/query/QueryExecution.html#execConstructTriples()> , which returns an instance of Iterator<Triple> ; this need not build a complete result set. The QueryExecution object can be constructed by calling one of the various QueryExecutionFactory::sparqlService <http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/query/QueryExecutionFactory.html#sparqlService(java.lang.String, com.hp.hpl.jena.query.Query)> methods By avoiding repeated queries using ORDER BY, OFFSET, and LIMIT, the load on the server can be greatly reduced. Simon On Sat, Jul 19, 2014 at 1:48 PM, Souza, Renan F. S. < renan123@missouristate.edu> wrote: > Not sure if triple store implementations allow you to do that directly. > One thing you could try is to use LIMIT and OFFSET (with ORDER BY) > modifiers so that the result would fit in memory, then you write the result > in a file. Do that as many times as needed until you have no more results > left. That would work if each query that uses LIMIT, OFFSET and ORDER BY > does not take too long to run. > > You can use the COUNT modifier to check how many times you would need to > do that. > Of course, if the results are really that big, I would write a simple > program to do the job. > > > > > > On Fri, Jul 18, 2014 at 6:57 PM, Luca Matteis <lmatteis@gmail.com> wrote: > >> Hello, >> >> I'm executing a SPARQL query against a large endpoint I've setup >> locally. The problem is that the result of this query is too large to >> be held in memory. Are there endpoints that allow me to stream the >> results to disk? For example, if it's a CONSTRUCT query it could >> stream the N-Triples line by line to disk. >> >> Thank you, >> Luca >> >> > > > -- > Thank you! > Regards, > > Souza, Renan F. S. > Bachelor of Computer Science > Missouri State University, Springfield, MO > Masters in Computer Systems Engineering > Federal University of Rio de Janeiro, Brazil > > > +55-21-99257-3934 > Personal email: renan-francisco@hotmail.com >
Received on Saturday, 19 July 2014 19:52:47 UTC