Re: sameAs.org

Hi,
> On 13 Feb 2017, at 13:35, Miel Vander Sande <Miel.VanderSande@UGent.be> wrote:
> 
> Hi Hugh,
> 
>> On 13 Feb 2017, at 13:41, Hugh Glaser <hugh@glasers.org> wrote:
>> 
>> Hi Miel,
>>> On 13 Feb 2017, at 12:04, Miel Vander Sande <Miel.VanderSande@UGent.be> wrote:
>>> 
>>> Hi Hugh,
>>> 
>>> 
>>>> On 07 Feb 2017, at 13:36, Hugh Glaser <hugh@glasers.org> wrote:
>>>> 
>>>> Hi again,
>>>> I think I could do with some help here (or at least advice!).
>>>> (I'm keeping this on the list, as I think it is probably still of interest to some people - feel free to email me to tell us to take it off list.)
>>>> 
>>>> So, following your pages, there seem to be quite a lot of options for me.
>>>> 
>>>> The context:
>>>> I have REST-like services that query based on a single (urlencoded) symbol/URI.
>>>> For example
>>>> http://differentfrom.org
>>>> which can be invoked by
>>>> http://differentfrom.org/symbols/http%3A%2F%2Fdata.ordnancesurvey.co.uk%2Fid%2F7000000000003822
>>>> and of course they do the conneg for rdf.
>>>> 
>>>> So it seems to me that I need a simple proxy that wraps whatever I need around this.
>>> 
>>> I’m not entirely sure I follow. A proxy could translate such request to the TPF server. This actually what the DBpedia archive does now after they switched to TPF. For instance, the URL http://dbpedia.mementodepot.org/memento/20151001000000/http://dbpedia.org/page/Frederik_H._Kreuger queries the TPF server at http://fragments.mementodepot.org/ internally.
>> But isn't that the other way around?
>> I already serve Linked Data - it's the TPF server I don't have.
>> So the proxy you talk about is LD calling TPF.
>> I need TPF calling LD (I think).
> 
> Ah, I see. That can be arranged if we provide an API datasource for our server somehow, can’t be too hard.
Now, that's what I like to hear! :-)
> 
> However, SPARQL querying is highly likely to be less performant this way, because of the following. A TPF response is typically paged (usually 100 triples) and provides an estimated cardinality to clients, so they can optimize the query execution.
> Is your system able to provide a quick estimate on the number of sameAs links a URI has? If the result is always smaller than the page size  (< 100), this is not necessary. Else, it could be a showstopper for reasonable performance.
> 
At the moment there are some with greater than 100 (for reasons I won't go into), but I am unlikely to carry those URIs on to the new one.
It is rare that there are more than 100 at the moment - it is more like less than 10 for the huge majority.
The server doesn't have a URI to give a count - it just gives all the URIs.
To be clear - there isn't a SPARQL endpoint - the data comes out of a bespoke SQL DBs.
> 
>>> 
>>>> (I'm pretty sure I don't want to cache anything, such as tdt, since the underlying service is doing all the caching it can manage, so if the proxy is running on the same server, that would be a poor thing to do.)
>>>> I can also invoke the service directly (in PHP), but that would simply be a call instead of the http GET.
>>> 
>>> Could you elaborate?
>> With respect to the caching; if the Linked Data -> TPF proxy is simple (and therefore lightweight) enough, then there should be not point in caching something which is making a call to get data that is already cached.
>> 
>> The direct PHP service is simply that if a TPF proxy is running on the same server as the Linked Data source, then it can make a direct call to get the data the same way the HTTP GET would make.
>> 
>> I hope that is clearer.
> 
> Makes sense.
> 
> I think I was under the impression you wanted to switch to another system entirely, so TPF software + HDT seemed like a perfect fit with extra benefits.
Ah, I see.

I would be willing to have a go at doing it, but I suspect it would take me rather a while, and I would still get lots wrong.
So if you would be able to (hack up?) "an API datasource for our server somehow, can't be too hard", we could at least get a sense of what it might all do.
And you might even find it useful for other datasources.
And I could have a look and see what can be improved for our services - I would be very happy to look at adding the TPF stuff as a standard interface for all services.

Best
Hugh
> 
> Best,
> 
> Miel
> 
> 
>> Cheers
>> Hugh
>>> 
>>> Cheers,
>>> 
>>> Miel
>>> 
>>>> It seems to me that there must be a really lightweight solution to this?
>>>> I guess I also need some hydra document(s) somewhere that describes the service.
>>>> 
>>>> Is that roughly right?
>>>> And if so, what is best?
>>>> 
>>>> Cheers
>>>> 
>>>>> On 6 Feb 2017, at 20:14, Miel Vander Sande <miel.vandersande@ugent.be> wrote:
>>>>> 
>>>>> Hi Hugh,
>>>>> 
>>>>>> On 06 Feb 2017, at 19:10, Hugh Glaser <hugh@glasers.org> wrote:
>>>>>> 
>>>>>> I'm probably missing something, but don't I need a SPARQL endpoint for the TPF server?
>>>>> 
>>>>> You certainly don’t; but you can, if you want. The TPF server publishes a datasource through a Web API that only accepts triple patterns. What type of datasource you use is up to you.
>>>>> 
>>>>> We mostly use HDT (http://rdfhdt.org), because it is very compact and very performant for the TPF server. You can also use a SPARQL endpoint as datasource, but I wouldn’t recommend it.
>>>>> 
>>>>> SPARQL queries are possible, but they are executed by the client. Check out http://client.linkeddatafragments.org/ for instance. It queries http://fragments.dbpedia.org/en by default, but this could be any other TPF interface.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Miel

Received on Monday, 13 February 2017 13:53:04 UTC