Re: The truth about SPARQL Endpoint availability

I believe that people coming from a MySQL (well MyISAM, specifically) background would assume a global COUNT to be fast, since it's a O(1) operation on a MyISAM table with a primary key.

Another way to go would be to add a NOOP command to SPARQL, surely?


Dan


On 6 Mar 2011, at 11:20, Tim Berners-Lee wrote:

> Maybe the count of triples should be special-cased in the sparql server code,
> spotted on input and the store size returned.
> if it is reasonable for the endpoint to keep track of the size of its store.
> (Do they anyway?)
> 
> Tim
> 
> On 2011-03 -05, at 11:58, Bill Roberts wrote:
> 
>> Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly prefer if people don't run a global count too often (or at all). It is indeed something that makes typical SPARQL implementations work very hard.
>> 
>> But it's a good reminder we should provide an alternative and i'll look into providing triple counts in voiD.
>> 
>> Bill
>> 
>> 
>> On 5 Mar 2011, at 15:14, Hugh Glaser wrote:
>> 
>>> Hi,
>>> On 5 Mar 2011, at 14:22, Andrea Splendiani wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I think it depends on the store, I've tried some (from the endpoint list) and some returns a answer pretty quickly. Some doesn't and some doesn't support count.
>>>> However, one could have this information only for the stores that answers the count query, no need to try all time.
>>> I am happy for a store implementor or owner to disagree, but I find it very unlikely that the owner of a store with a decent chunk of data (> 1M triples, say) would be happy for someone to keep issuing such a query, even if they did decide to give enough resources to execute it.
>>> I would quickly blacklist such a site.
>>>> 
>>>> VoID:
>>>> is this a good query:
>>>> select * where {?s <http://rdfs.org/ns/void#numberOfTriples> ?o } 
>>> 
>>> I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the scovo stuff, so more like:
>>> 
>>> SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE
>>>         { ?ds a void:Dataset .
>>>           ?ds void:sparqlEndpoint ?uri .
>>>           ?ds rdfs:label ?endpoint .
>>>           ?ds void:statItem [ scovo:dimension void:numberOfTriples ; rdf:value  ?triples ] .
>>>        }
>>> 
>>> Try it at
>>> http://kwijibo.talis.com/voiD/
>>> or
>>> http://void.rkbexplorer.com/
>>> 
>>> I guess Pierre-Yves might like to enhance his page by querying a voiD store to also give basic stats.
>>> Or someone might like to do a store reporter that uses (a) voiD endpoint(s) plus Pierre-Yves's data (he has a SPARQL endpoint), to do so.
>>> And maybe the CKAN endpoint would have extra useful data as well.
>>> A real Semantic Web application that queried more than one SPARQL endpoint - now that would be a novelty!
>>> Fancy the challenge, it is the weekend?! :-)
>>> 
>>> ciao
>>> Hugh
>>> 
>>>> 
>>>> it doesn't seem viable if so.
>>>> 
>>>> ciao,
>>>> Andrea
>>>> 
>>>> 
>>>> Il giorno 05/mar/2011, alle ore 13.49, Hugh Glaser ha scritto:
>>>> 
>>>>> NIce idea, but,... :-)
>>>>> 
>>>>> SELECT (count(*) as ?c) WHERE {?s ?p ?o}
>>>>> 
>>>>> is a pretty anti-social thing to do to a store.
>>>>> At best, a store of any size will spend a while thinking, and then quite rightly decide they have burnt enough resources, and return some sort of error.
>>>>> 
>>>>> For a properly maintained site, of course, the VoiD description will give lots of similar information.
>>>>> Best
>>>>> Hugh
>>>>> 
>>>>> On 5 Mar 2011, at 13:06, Andrea Splendiani wrote:
>>>>> 
>>>>>> Hi, very nice!
>>>>>> I have a small suggestion:
>>>>>> 
>>>>>> why don't you ask "count(*) where {?s ?p ?o}" to the endpoint ?
>>>>>> Or ask for the number of graphs ?
>>>>>> Both information, number of triples and number of graphs, if logged and compared over time, can give a practical view of the liveliness of the content of the endpoint.
>>>>>> 
>>>>>> best,
>>>>>> Andrea Splendiani
>>>>>> 
>>>>>> 
>>>>>> Il giorno 28/feb/2011, alle ore 18.55, Pierre-Yves Vandenbussche ha scritto:
>>>>>> 
>>>>>>> Hello all,
>>>>>>> 
>>>>>>> you have already encountered problems of SPARQL endpoint accessibility ?
>>>>>>> you feel frustrated they are never available when you need them?
>>>>>>> you develop an application using these services but wonder if it is reliable?
>>>>>>> 
>>>>>>> Here is a tool [1] that allows you to know public SPARQL endpoints availability and monitor them in the last hours/days. 
>>>>>>> Stay informed of a particular (or all) endpoint status changes through RSS feeds.
>>>>>>> All availability information generated by this tool is accessible through a SPARQL endpoint.
>>>>>>> 
>>>>>>> This tool fetches public SPARQL endpoints from CKAN  open data. From this list, it runs tests every hour for availability.
>>>>>>> 
>>>>>>> [1] http://labs.mondeca.com/sparqlEndpointsStatus/index.html
>>>>>>> [2] http://ckan.net/
>>>>>>> 
>>>>>>> Pierre-Yves Vandenbussche.
>>>>>> 
>>>>>> Andrea Splendiani
>>>>>> Senior Bioinformatics Scientist
>>>>>> Centre for Mathematical and Computational Biology
>>>>>> +44(0)1582 763133 ext 2004
>>>>>> andrea.splendiani@bbsrc.ac.uk
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> -- 
>>>>> Hugh Glaser,  
>>>>>          Intelligence, Agents, Multimedia
>>>>>          School of Electronics and Computer Science,
>>>>>          University of Southampton,
>>>>>          Southampton SO17 1BJ
>>>>> Work: +44 23 8059 3670, Fax: +44 23 8059 3045
>>>>> Mobile: +44 78 9422 3822, Home: +44 23 8061 5652
>>>>> http://www.ecs.soton.ac.uk/~hg/
>>>>> 
>>>>> 
>>>> 
>>>> Andrea Splendiani
>>>> Senior Bioinformatics Scientist
>>>> Centre for Mathematical and Computational Biology
>>>> +44(0)1582 763133 ext 2004
>>>> andrea.splendiani@bbsrc.ac.uk
>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> Hugh Glaser,  
>>>            Intelligence, Agents, Multimedia
>>>            School of Electronics and Computer Science,
>>>            University of Southampton,
>>>            Southampton SO17 1BJ
>>> Work: +44 23 8059 3670, Fax: +44 23 8059 3045
>>> Mobile: +44 78 9422 3822, Home: +44 23 8061 5652
>>> http://www.ecs.soton.ac.uk/~hg/
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> 

Received on Sunday, 6 March 2011 12:54:35 UTC