Re: LODCloud SPARQL Endpoints Spreadsheet from Claus Stadler on 2020-03-17 (public-lod@w3.org from March 2020)

From: Claus Stadler <cstadler@informatik.uni-leipzig.de>
Date: Tue, 17 Mar 2020 21:05:25 +0100
To: public-lod@w3.org
Cc: kidehen@openlinksw.com
Message-ID: <41bee831-fccc-cdf7-8104-24429bec733d@informatik.uni-leipzig.de>
Hi Kingsley,


The RDF output is now:


<http://de.dbpedia.org/sparql#service>

         a            sd:Service ;
         sd:endpoint  <http://de.dbpedia.org/sparql> ;
         <https://schema.org/dateModified>
"2020-03-17T20:01:47.779+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
         <https://schema.org/serverStatus>
                 <https://schema.org/Online> .



(1) I noted that your LODCloud_SPARQL_Endpoints.ttl dataset fails to parse because of a missing xsd declaration; I made a PR for it :)

(2) Status reports vs Service status:

In principle, each status request could create an an observation record, such as

[] a StatusResportObservation ; forService x ; atTime y ; withResult z

This is a nice model for temporal analysis so it should be created, and I think the HTTP response codes would fit there best.

(There is just a minor technical issue, that we'd need to add a custom jena SPARQL function that yields the HTTP headers.)

The "latest status" however is a practical abstraction of the most recent observations, so the label online/offline are fine.

I found https://schema.org/Online and https://schema.org/OfflineTemporarily and https://schema.org/dateModified as a best effort. The former two are actually used to describe game servers which is fine, as one can solve Sudokus with SPARQL *g*. dateModified is ambiguous - in our case it only means that the record was updated, but it does not imply that the server status itself changed - and worse, it would be ambiguous if another dataset used this property to denote latest changes to the data.


(3) DCAT2: Services, Datasets and Distributions


I see that in the LODCloud_SPARQL_Endpoints.ttl dataset, the suffix is '#this'. The reason I used the suffix '#service' is due to DCAT2, which

distinguishes between datasets, distributions and services. So my considerations were:

- http://dbpedia.org/sparql#service -> The SPARQL service itself

- http://dbpedia.org/sparql#dataset -> The DCAT dataset identifier that describes the abstract RDF graph / SPARQL dataset (depending on the engine) that is accessible through the service AND is related to a publishing authority. This identifier can later be owl:sameAs'd to better dataset identifier (if it exists)

- http://dbpedia.org/sparql#distribution -> The #dataset as accessible through a specific #service


So under the perspective that there a different aspects to a data service,  http://dbpedia.org/sparql#this seems somewhat sub-par

I know that the naming is arbitrary, but in practice it leads to the issue e.g. inverse functional properties and reasoners are needed to consolidate the data, where maybe a best practice / convention would ease things.


Cheers,

Claus


On 17.03.20 18:04, Kingsley Idehen wrote:
> On 3/17/20 12:22 PM, Claus Stadler wrote:
>> Hi Masahide,
>>
>> For now I have manually added it by request and its here:
>>
>> https://github.com/SmartDataAnalytics/lodservatory/blob/master/latest-status.ttl#L423
>>
>>
>>
>> Note, that the service is still a moving target, so the vocabulary is
>> still going to change (HTTP-in-RDF was suggested), and I yet need to
>> include any missing endpoints from the so far 258 endpoints (possibly
>> excluding obviously dead ones) of OpenLink's
>> LODCloud_SPARQL_Endpoints.ttl dataset.
>>
>> Also as OpenLink has pretty much already set up the community process
>> for adding endpoints, I would just build on it and download OpenLink's
>> endpoint dataset as part of the service monitoring workflow.
>>
>> Also, after some time, e.g. a year or so, the service status updates
>> may cause the git repo to grow to a size where its better to switch to
>> a new ne anyway - so its better to keep the static and dynamic data in
>> separate repos anyway.
>>
>>
>> Cheers,
>>
>> Claus
>
> Hi Claus,
>
> I would suggest the inclusion of an IFP (owl:InverseFunctionalProperty)
> in your dataset to simplify messing with other datasets (e.g. ours).
> Ditto a type assertion e.g. sd:Service.
>
> ## Turtle Start ##
>
> @prefix eg:    <http://www.example.org/> .
>
> <http://de.dbpedia.org/sparql#service>
>          eg:serviceStatus      "online" ;
>          eg:serviceStatusTime
> "2020-03-17T16:01:43.165+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
> .
>
> ## Turtle End ##
>
>
> Becomes
>
> ## Turtle Start ##
>
> @prefix eg:    <http://www.example.org/> .
> @prefix schema: <http://schema.org/> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
>
> <http://de.dbpedia.org/sparql#service>
>          a sd:Service ;
>          schema:url <http://de.dbpedia.org/sparql> ;   ## can be
> designated as an IFP by a reasoner etc..
>          eg:serviceStatus      "online" ;
>          eg:serviceStatusTime
> "2020-03-17T16:01:43.165+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
> .
>
> ## Turtle End ##
>
>
> Kingsley
>
>>
>> On 17.03.20 16:43, KANZAKI Masahide wrote:
>>> Hello Claus, thank you for the great effort.
>>>
>>> The Japan Search endpoint [1], which is not listed as "online", was
>>> suspended temporarily for system maintenance on March 17. It would be
>>> grateful if you kindly re-examine the service and add it to the
>>> "online" list.
>>>
>>> Thank you and best regards,
>>>
>>> [1] https://jpsearch.go.jp/rdf/sparql
>>>
>>> 2020年3月17日(火) 23:19 Claus Stadler
>>> <cstadler@informatik.uni-leipzig.de>:
>>>> Hi all,
>>>>
>>>>
>>>> Last week I got a list of endpoints from one colleague and it turned
>>>> out that many were dead so I hacked up a little service check [1]
>>>> with Github-Actions (and our own sparql-integrate tool[2]) . And
>>>> then another colleague told my of the rather recent effort that
>>>> happened here.
>>>>
>>>>
>>>> Openlink's LODCloud_SPARQL_Endpoints.ttl [3] looks looks very good!
>>>>
>>>>
>>>> What I can contribute is a proof-of-concept of a completely
>>>> self-contained SPARQL-based service monitoring setup [1] where
>>>> Github does all the work of service checking, where everyone can
>>>> clone the repo, adjust the queries and get whatever RDF out what
>>>> they use case(s) demand. Right now I only added online/offline
>>>> checking, but in principle one can also just query the endpoints for
>>>> features and emit service descriptions or query for dataset metrics
>>>> and emit void.
>>>>
>>>>
>>>> Of course there are resource limitations, Github only allows 1000
>>>> requests per hour, the workflow time is limited, and the virtual
>>>> instances only have 2 CPUs.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Claus
>>>>
>>>>
>>>> [1]
>>>> https://github.com/SmartDataAnalytics/lodservatory/blob/master/latest-status.ttl
>>>>
>>>> [2] https://github.com/SmartDataAnalytics/SparqlIntegrate
>>>>
>>>> [3]
>>>> https://github.com/OpenLinkSoftware/general-turtle-doc-collection/blob/master/LODCloud_SPARQL_Endpoints.ttl
>>>>
>>>>
>>>> On 24.12.19 16:12, Kingsley Idehen wrote:
>>>>
>>>> On 12/23/19 11:55 AM, Kingsley Idehen wrote:
>>>>
>>>> On 12/23/19 5:01 AM, Michel Dumontier wrote:
>>>>
>>>> Hi Kingsley,
>>>>    Is it correct that we should continue to make changes to the
>>>> spreadsheet, or should we do do a pull request against the turtle file?
>>>>
>>>>
>>>> Whichever works best for you :)
>>>>
>>>>
>>>> If the former, how often will you update the turtle document and the
>>>> endpoint?
>>>>
>>>>
>>>> We will update it frequently in response to edit contributions to
>>>> either data source.
>>>>
>>>> Kingsley
>>>>
>>>>
>>>> m.
>>>>
>>>> On Fri, Dec 20, 2019 at 10:36 PM Kingsley Idehen
>>>> <kidehen@openlinksw.com> wrote:
>>>>> On 12/18/19 2:31 PM, Kingsley Idehen wrote:
>>>>>
>>>>> On 9/17/19 6:10 PM, Kingsley Idehen wrote:
>>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> As part of the LODCloud effort (starting in 2007), a number of
>>>>> SPARQL Endpoints emerged around the initial SPARQL endpoint
>>>>> provided by DBpedia. Today, that cloud has grown into the largest
>>>>> Knowledge Graph on earth (by far!) and continues to drive new
>>>>> frontiers related to Artificial Intelligence and Machine Learning.
>>>>>
>>>>> Having established itself as the preeminent global Knowledge Graph
>>>>> on earth, it is extremely important that we maintain an active list
>>>>> of SPARQL endpoints using practices that scale. Thus, we are
>>>>> providing a shared Google Spreadsheet for crowd-sourcing the
>>>>> maintenance of SPARQL endpoints that make up this important
>>>>> Knowledge Graph.
>>>>>
>>>>> Please contribute your SPARQL endpoint(s) to the spreadsheet.
>>>>>
>>>>> Links
>>>>>
>>>>> SPARQL Endpoint Google Spreadsheet 1
>>>>> What is the LODCloud, and why is it important?
>>>>>
>>>>>
>>>>> Season's Greetings to all,
>>>>>
>>>>> This is a final call regarding contributions to the SPARQL Query
>>>>> Service Endpoint Description effort that we are seeding via a
>>>>> shared Google Spreadsheet [1].
>>>>>
>>>>> The goal is to produce an RDF-Turtle document that describes these
>>>>> endpoints using terms from the SPARQL Service Description [2] and
>>>>> VoID [3] Ontologies. Naturally, the document will also be published
>>>>> using Linked Data principles.
>>>>>
>>>>>
>>>>> [1]
>>>>> https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
>>>>>
>>>>> [2] https://www.w3.org/ns/sparql-service-description
>>>>>
>>>>> [3] http://rdfs.org/ns/void#
>>>>>
>>>>>
>>>>> We've published an RDF-Turtle document that describes a collection
>>>>> of SPARQL Query Services Endpoints to our Github repository [1].
>>>>> Naturally, content of said document has been deployed using Linked
>>>>> Data principles [2] and sponged by our URIBurner Service [3].
>>>>>
>>>>> Enjoy!
>>>>>
>>>>> Links:
>>>>>
>>>>> [1]
>>>>> https://github.com/OpenLinkSoftware/general-turtle-doc-collection/blob/master/LODCloud_SPARQL_Endpoints.ttl
>>>>> -- Github
>>>>>
>>>>> [2] http://data.openlinksw.com/oplweb/sparql-endpoint134#this --
>>>>> Example URI
>>>>>
>>>>> [3]
>>>>> http://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fwww.openlinksw.com%2Fdata%2Fturtle%2Foplweb%2FLODCloud_SPARQL_Endpoints.ttl&distinct=1
>>>>> -- About the SPARQL Query Service Endpoints
>>>>>
>>>>> -- 
>>>>> Regards,
>>>>>
>>>>> Kingsley Idehen
>>>>> Founder & CEO
>>>>> OpenLink Software
>>>>> Home Page: http://www.openlinksw.com
>>>>> Community Support: https://community.openlinksw.com
>>>>> Weblogs (Blogs):
>>>>> Company Blog: https://medium.com/openlink-software-blog
>>>>> Virtuoso Blog: https://medium.com/virtuoso-blog
>>>>> Data Access Drivers Blog:
>>>>> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>>>>
>>>>> Personal Weblogs (Blogs):
>>>>> Medium Blog: https://medium.com/@kidehen
>>>>> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>>>>>                 http://kidehen.blogspot.com
>>>>>
>>>>> Profile Pages:
>>>>> Pinterest: https://www.pinterest.com/kidehen/
>>>>> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
>>>>> Twitter: https://twitter.com/kidehen
>>>>> Google+: https://plus.google.com/+KingsleyIdehen/about
>>>>> LinkedIn: http://www.linkedin.com/in/kidehen
>>>>>
>>>>> Web Identities (WebID):
>>>>> Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>>>>>           :
>>>>> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>>>>
>>>> -- 
>>>> Michel Dumontier
>>>> Distinguished Professor of Data Science
>>>> Maastricht University
>>>> http://dumontierlab.com
>>>>
>>>>
>>>>
>>>> Hi Everyone,
>>>>
>>>> The URI of the Github repo associated with the SPARQL Query Service
>>>> endpoint descriptions has been changed [1]. Thus, use the new
>>>> repository for branch forks and pull requests.
>>>>
>>>> [1] https://github.com/OpenLinkSoftware/lod-cloud
>>>>
>>>> Happy Holidays!
>>>>
>>>> -- 
>>>> Regards,
>>>>
>>>> Kingsley Idehen
>>>> Founder & CEO
>>>> OpenLink Software
>>>> Home Page: http://www.openlinksw.com
>>>> Community Support: https://community.openlinksw.com
>>>> Weblogs (Blogs):
>>>> Company Blog: https://medium.com/openlink-software-blog
>>>> Virtuoso Blog: https://medium.com/virtuoso-blog
>>>> Data Access Drivers Blog:
>>>> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>>>
>>>> Personal Weblogs (Blogs):
>>>> Medium Blog: https://medium.com/@kidehen
>>>> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>>>>                 http://kidehen.blogspot.com
>>>>
>>>> Profile Pages:
>>>> Pinterest: https://www.pinterest.com/kidehen/
>>>> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
>>>> Twitter: https://twitter.com/kidehen
>>>> Google+: https://plus.google.com/+KingsleyIdehen/about
>>>> LinkedIn: http://www.linkedin.com/in/kidehen
>>>>
>>>> Web Identities (WebID):
>>>> Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>>>>           :
>>>> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>>>
>>>> -- 
>>>> Dipl. Inf. Claus Stadler
>>>> Department of Computer Science, University of Leipzig
>>>> Research Group: http://aksw.org/
>>>> Workpage & WebID: http://aksw.org/ClausStadler
>>>> Phone: +49 341 97-32260
>>>
-- 
Dipl. Inf. Claus Stadler
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org/
Workpage & WebID: http://aksw.org/ClausStadler
Phone: +49 341 97-32260
Received on Tuesday, 17 March 2020 20:05:44 UTC