Re: Discovering a query endpoint associated with a given Linked Data resource from Hugh Glaser on 2015-08-26 (public-lod@w3.org from August 2015)

From: Hugh Glaser <hugh@glasers.org>
Date: Wed, 26 Aug 2015 13:23:47 +0100
To: Nandana Mihindukulasooriya <nmihindu@fi.upm.es>
Cc: Miel Vander Sande <miel.vandersande@ugent.be>, Sarven Capadisli <info@csarven.ca>, Víctor Rodríguez Doncel <vrodriguez@fi.upm.es>, public-lod <public-lod@w3.org>, Heiko Paulheim <heiko@informatik.uni-mannheim.de>
Message-Id: <53EE8C02-C896-421F-85C4-73FB5FBB2CAA@glasers.org>
Great topic started.

tl;dr - it should be in RDF in the doc you get back, as a simple triple, a bit like seeAlso.

So, as I understand it, Nandana is addressing quite a common situation.
I have just resolved a URI I was given and got some RDF, and I have a suspicion that there is a SPARQL endpoint where I can find out more.
In fact, in many situations, the RDF I was given was the result of the server doing a CBD (Concise Bounded Description) of some kind to a SPARQL store, and the SPARQL store is also actually exposed as a SPARQL endpoint.

To put it bluntly, pissing about with vOID and .well-known for something like this is ridiculous.
The RDF I get back should point me to such SPARQL endpoints.
I shouldn’t have to parse more files and do all sorts of pattern matching - the publisher knew where the SPARQL endpoint was, and should have told me - remember make it easy for the consumer, not the publisher.
Another major reason is that the publisher may not have the rights to publish .well-known and its ilk.
And if it comes with the RDF we can be really confident of the provenance and trust of who has recommended it.
Also, it is a damn sight easier to maintain, than to rebuild the vOID document every time something changes.

All it needs is a little property, similar to rdfs:seeAlso (it could have been a subProperty of rdfs:seeAlso, but it isn’t a Resource), that allows me to list the (many?) SPARQL endpoint(s) that I recommend a consumer might like to try to lookup things about my URI.
So, if I wanted, I could publish something about http://sws.geonames.org/3405870/ , and say there is SPARQL endpoint over at dbpedia where you can get more information.

[In fact, this is very similar to discovering associated sameAs stores (ones that the data publisher recommends).
We used to publish such a triple in our RDF from URI resolution that told you where to get the sameAs information, if any.
In that case we could make it a subProperty of seeAlso, since it returned RDF from a lookup, but it also meant you didn’t get the location of the actually sameAs store itself, which would have let you do similar lookups on it.
We stopped doing it on the new platform because we think no-one used it :-)]

Best
Hugh
> On 26 Aug 2015, at 12:30, Heiko Paulheim <heiko@informatik.uni-mannheim.de> wrote:
> 
> Hi Nandana,
> 
> no, we haven't investigated that further - for the "why", it is hard to examine that at scale (you could of course ask all data providers, but...). 
> 
> For the non-discoverable VoIDs, there is also a methodological problem - how would we know that they exist if we are not able to discover them?
> 
> Best,
> Heiko
> 
> 
> 
> Am 26.08.2015 um 12:40 schrieb Nandana Mihindukulasooriya:
>> Hi Heiko,
>> 
>> Thanks a lot for the pointer to the paper. 
>> 
>> In your experiment, were you able to get some insights on *why* data publishers are not providing VoID descriptions when it is applicable to do so (leaving out single FOAF documents etc.) ? 
>> 
>> [[Approaches using proposed methods such as VoID and the provenance vocabulary are scarcely in use (and sometimes not implemented according to the specification), they lead to a valid SPARQL endpoint in less than 1% of all cases.]]
>> 
>> Also did you find many occasions where actually a VoID description is available, but it is not discoverable according to the VoID spec (such as the case you mention about not having the description in the root level but in another level). For instance,  http://dbpedia.org/void/Dataset exists but is not in http://dbpedia.org/.well-known/void and the resources don't provide a back-link.  
>> 
>> Best Regards,
>> Nandana
>> 
>> 
>> On Wed, Aug 26, 2015 at 12:05 PM, Heiko Paulheim <heiko@informatik.uni-mannheim.de> wrote:
>> Hi all,
>> 
>> two years ago, we conducted an empirical experiment to find out how promising the different approaches to discover SPARQL endpoints are. The results were rather disappointing, see [1]. 
>> 
>> Executive summary: rather than trying to find VoID descriptions (which rarely exist), querying catalogues like datahub seems more promising (higher recall at least, precision is lower).
>> 
>> Hth.
>> 
>> Best,
>> Heiko
>> 
>> [1] http://www.heikopaulheim.com/docs/iswc2013_poster.pdf
>> 
>> 
>> 
>> 
>> Am 26.08.2015 um 11:50 schrieb Nandana Mihindukulasooriya:
>>> Thanks all for the pointers. 
>>> 
>>> Yes, it seems it is quite rare in practice. I tried several hosts that provide Linked Data resources but couldn't find ones that provide a VoID description in .well-known/void. 
>>> 
>>> I guess there is a higher technical barrier in making that description available in the given location compared to providing that information in the response in most cases. So probably the pragmatic thing to do would be to include this information either in the content or as a Link relation header using the void properties when dereferenced. 
>>> 
>>> So I can use the void:inDataset back-link mechanism [1] and point to a VoID description that will have the necessary information about the query endpoints. 
>>> 
>>> -----
>>> dbpedia:Sri_Lanka void:inDataset _:DBpedia .
>>> _:DBpedia a void:Dataset;
>>>     void:sparqlEndpoint <http://dbpedia.org/sparql>;
>>>     void:uriLookupEndpoint <http://fragments.dbpedia.org/2014/en?subject=> .
>>> ------
>>> or 
>>> 
>>> ----
>>> Link: <http://dbpedia.org/void/Dataset>; rel="http://rdfs.org/ns/void#inDataset"
>>> ----
>>> 
>>> Best Regards,
>>> Nandana
>>> 
>>> [1] http://www.w3.org/TR/void/#discovery-links
>>> 
>>> On Wed, Aug 26, 2015 at 11:05 AM, Miel Vander Sande <miel.vandersande@ugent.be> wrote:
>>> Hi Nandana,
>>> 
>>> I guess VoID would be the best fit
>>> 
>>> In case of LDF you could use
>>> 
>>> <...> void:uriLookupEndpoint <http://fragments.dbpedia.org/2014/en?subject=>
>>> 
>>> But wether these exists in practice? Probably not. I'd leave it up to the dereference publisher to provide this triple in te response, rather than doing the .well_known thing.
>>> 
>>> Best,
>>> 
>>> Miel
>>> 
>>> On 26 Aug 2015, at 10:57, Víctor Rodríguez Doncel <vrodriguez@fi.upm.es> wrote:
>>> 
>>> >
>>> > Well, you might try to look in this folder location:
>>> > .well-known/void
>>> > And possibly find a "void:sparqlEndpoint".
>>> >
>>> > But this would be too good to be true.
>>> >
>>> > Regards,
>>> > Víctor
>>> >
>>> > El 26/08/2015 10:45, Nandana Mihindukulasooriya escribió:
>>> >> Hi,
>>> >>
>>> >> Is there a standard or widely used way of discovering a query endpoint (SPARQL/LDF) associated with a given Linked Data resource?
>>> >>
>>> >> I know that a client can use the "follow your nose" and related link traversal approaches such as [1], but if I wonder if it is possible to have a hybrid approach in which the dereferenceable Linked Data resources that optionally advertise query endpoint(s) in a standard way so that the clients can perform queries on related data.
>>> >>
>>> >> To clarify the use case a bit, when a client dereferences a resource URI it gets a set of triples (an RDF graph) [2].  In some cases, it might be possible that the returned graph could be a subgraph of a named graph / default graph of an RDF dataset. The client wants to discover if a query endpoint that exposes the relevant dataset, if one is available.
>>> >>
>>> >> For example, something like the following using the "search" link relation [3].
>>> >>
>>> >> ------
>>> >> HEAD /resource/Sri_Lanka
>>> >> Host: http://dbpedia.org
>>> >> ------
>>> >> 200 OK
>>> >> Link: <http://dbpedia.org/sparql>; rel="search"; type="sparql", <http://fragments.dbpedia.org/2014/en#dataset>; rel="search"; type="ldf"
>>> >> ... other headers ...
>>> >> ------
>>> >>
>>> >> Best Regards,
>>> >> Nandana
>>> >>
>>> >> [1] http://swsa.semanticweb.org/sites/g/files/g524521/f/201507/DissertationOlafHartig_0.pdf
>>> >> [2] http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-rdf-graph
>>> >> [3] http://www.iana.org/assignments/link-relations/link-relations.xhtml
>>> >
>>> >
>>> > --
>>> > Víctor Rodríguez-Doncel
>>> > D3205 - Ontology Engineering Group (OEG)
>>> > Departamento de Inteligencia Artificial
>>> > Facultad de Informática
>>> > Universidad Politécnica de Madrid
>>> >
>>> > Campus de Montegancedo s/n
>>> > Boadilla del Monte-28660 Madrid, Spain
>>> > Tel. (+34) 91336 3672
>>> > Skype: vroddon3
>>> >
>>> >
>>> > ---
>>> > El software de antivirus Avast ha analizado este correo electrónico en busca de virus.
>>> > https://www.avast.com/antivirus
>>> >
>>> >
>> 
>>  
>> -- 
>> Prof. Dr. Heiko Paulheim
>> Data and Web Science Group
>> University of Mannheim
>> Phone: 
>> +49 621 181 2646
>> 
>> B6, 26, Room C1.08
>> D-68159 Mannheim
>> 
>> Mail: 
>> heiko@informatik.uni-mannheim.de
>> 
>> Web: 
>> www.heikopaulheim.com
>> 
> 
> -- 
> Prof. Dr. Heiko Paulheim
> Data and Web Science Group
> University of Mannheim
> Phone: +49 621 181 2646
> B6, 26, Room C1.08
> D-68159 Mannheim
> 
> Mail: 
> heiko@informatik.uni-mannheim.de
> 
> Web: 
> www.heikopaulheim.com

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Received on Wednesday, 26 August 2015 12:24:18 UTC