Re: How to publish SPARQL endpoint limits/metadata? from Kingsley Idehen on 2013-10-15 (public-lod@w3.org from October 2013)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 15 Oct 2013 07:13:39 -0400
To: public-lod@w3.org
Message-ID: <525D2363.5080308@openlinksw.com>
On 10/15/13 5:05 AM, Frans Knibbe | Geodan wrote:
> On 2013-10-14 20:05, Kingsley Idehen wrote:
>>>>
>>>> Plus:
>>>>
>>>> 7. query timeout (in milliseconds)  -- which determines how much 
>>>> processing time threshold per query .
>>>>
>>>> Ideally, you want to use a combination of timeouts, result size 
>>>> (max. results per query), offet, and limit to enable paging through 
>>>> data .
>>> Yes, enabling paging is the main thing I was thinking about. For 
>>> that to work, one needs to know the maximum page size allowed.  
>>> SPARQL ORDER BY, OFFSET and LIMIT  can be used to build requests for 
>>> pages. But how does the query timeout setting of the server come in 
>>> to play?
>>
>> In our case (re., Virtuoso) its the time taken to produce a solution, 
>> bearing in mind the  LIMIT and OFFSET query values. If a complete 
>> query solution isn't produced, you get a partial solution, and the 
>> ability to retry with an extended timeout.
>
> Ah, I understand, thank you. I can see that this way of handling 
> timeouts has its merits. And now I see that this limit is much like 
> the maximum results per request: The client gets a response, but if it 
> is not aware that the response is partial then it could have 
> undesirable effects.
>
> Is this behaviour standardized? Or do different flavours of endpoints 
> handle timeouts differently? If the latter is the case, maybe it makes 
> sense to also publish something about how timeouts are handled in the 
> endpoint description.

Yes, we need to standardize how this is incorporated into SPARQL 
endpoint descriptions. Currently, we handle this in a Virtuoso specific 
INI file, but nothing stops that information making its way into the 
endpoint description and HTTP response metadata etc..

Kingsley
>
> Regards,
> Frans
>
>>
>>> I assume that if a request for a page of data times out you would 
>>> get a timeout error code (522 probably). 
>>
>> Partial result. The idea is that its like a quiz whereby you have an 
>> answer provided within allotted time, or you attempt to answer in 
>> extended time i.e., you request that or it cycles back to you because 
>> the opponent couldn't provide an answer etc..
>>
>>> What would be gained with prior knowledge of the timeout setting?
>>
>> The timeout is the set time for producing a compete or partial query 
>> solution.
>>
>> What we need to do, which will help others, is get all the HTTP 
>> responses and controls for paging properly returns via HTTP 
>> responses. This matter has been discussed in the past (on this list) 
>> and we are committed to getting the HTTP responses in line, as I've 
>> described.
>>
>> Action item for us: demonstrate what I am describing using RESTful 
>> interaction patterns via cURL. Once in place, we would have the 
>> foundation of a pattern that anyone could *optionally* incorporate etc..
>>
>> Kingsley
>>>
>>> Regards,
>>> Frans
>>>
>>>>
>>>> Kingsley
>>>>>
>>>>>
>>>>> On 8-10-2013 17:45, Leigh Dodds wrote:
>>>>>> Hi,
>>>>>>
>>>>>> As others have suggested, extending service descriptions would be the
>>>>>> best way to do this. This might make a nice little community project.
>>>>>>
>>>>>> It would be useful to itemise a list of the type of limits that might
>>>>>> be faced, then look at how best to model them.
>>>>>>
>>>>>> Perhaps something we could do on the list?
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> L.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 8, 2013 at 10:46 AM, Frans Knibbe | Geodan
>>>>>> <frans.knibbe@geodan.nl>  wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am experimenting with running SPARQL endpoints and I notice the need to
>>>>>>> impose some limits to prevent overloading/abuse. The easiest and I believe
>>>>>>> fairly common way to do that is to LIMIT the number of results that the
>>>>>>> endpoint will return for a single query.
>>>>>>>
>>>>>>> I now wonder how I can publish the fact that my SPARQL endpoint has a LIMIT
>>>>>>> and that is has a certain value.
>>>>>>>
>>>>>>> I have read the thread Public SPARQL endpoints:managing (mis)-use and
>>>>>>> communicating limits to users, but that seemed to be about how to
>>>>>>> communicate limits during querying. I would like to know if there is a way
>>>>>>> to communicate limits before querying is started.
>>>>>>>
>>>>>>> It seems to me that a logical place to publish a limit would be in the
>>>>>>> metadata of the SPARQL endpoint. Those metadata could contain all limits
>>>>>>> imposed on the endpoint, and perhaps other things like a SLA or a
>>>>>>> maintenance schedule... data that could help in the proper use of the
>>>>>>> endpoint by both software agents and human users.
>>>>>>>
>>>>>>> So perhaps my enquiry really is about a standard for publishing SPARQL
>>>>>>> endpoint metadata, and how to access them.
>>>>>>>
>>>>>>> Greetings,
>>>>>>> Frans
>>>>>>>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Tuesday, 15 October 2013 11:14:01 UTC