W3C home > Mailing lists > Public > public-lod@w3.org > March 2020

Re: LODCloud SPARQL Endpoints Spreadsheet

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 17 Mar 2020 17:10:04 -0400
To: public-lod@w3.org
Message-ID: <8578bc50-de65-5e70-26d5-3ff766e8387d@openlinksw.com>
On 3/17/20 4:05 PM, Claus Stadler wrote:
> Hi Kingsley,
>
>
> The RDF output is now:
>
>
> <http://de.dbpedia.org/sparql#service>
>
>         a            sd:Service ;
>         sd:endpoint  <http://de.dbpedia.org/sparql> ;
>         <https://schema.org/dateModified>
> "2020-03-17T20:01:47.779+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
> ;
>         <https://schema.org/serverStatus>
>                 <https://schema.org/Online> .
>
>
>
> (1) I noted that your LODCloud_SPARQL_Endpoints.ttl dataset fails to
> parse because of a missing xsd declaration; I made a PR for it :)


Okay, but which repo? Just provide the URI .


>
> (2) Status reports vs Service status:
>
> In principle, each status request could create an an observation
> record, such as
>
> [] a StatusResportObservation ; forService x ; atTime y ; withResult z
>
> This is a nice model for temporal analysis so it should be created,
> and I think the HTTP response codes would fit there best.
>
> (There is just a minor technical issue, that we'd need to add a custom
> jena SPARQL function that yields the HTTP headers.)
>
> The "latest status" however is a practical abstraction of the most
> recent observations, so the label online/offline are fine.
>
> I found https://schema.org/Online and
> https://schema.org/OfflineTemporarily and
> https://schema.org/dateModified as a best effort. The former two are
> actually used to describe game servers which is fine, as one can solve
> Sudokus with SPARQL *g*. dateModified is ambiguous - in our case it
> only means that the record was updated, but it does not imply that the
> server status itself changed - and worse, it would be ambiguous if
> another dataset used this property to denote latest changes to the data.
>
>
> (3) DCAT2: Services, Datasets and Distributions
>
>
> I see that in the LODCloud_SPARQL_Endpoints.ttl dataset, the suffix is
> '#this'. The reason I used the suffix '#service' is due to DCAT2, which


It isn't really a suffix per se. Rather, #this is a default indexical
that simplifies URI creation and use. Thus, you could have left that as
is while still using #dataset and #distribution to craft URIs for
dataset and distribution descriptions.


>
> distinguishes between datasets, distributions and services. So my
> considerations were:
>
> - http://dbpedia.org/sparql#service -> The SPARQL service itself
>
> - http://dbpedia.org/sparql#dataset -> The DCAT dataset identifier
> that describes the abstract RDF graph / SPARQL dataset (depending on
> the engine) that is accessible through the service AND is related to a
> publishing authority. This identifier can later be owl:sameAs'd to
> better dataset identifier (if it exists)
>
> - http://dbpedia.org/sparql#distribution -> The #dataset as accessible
> through a specific #service
>
>
> So under the perspective that there a different aspects to a data
> service,  http://dbpedia.org/sparql#this seems somewhat sub-par
>
> I know that the naming is arbitrary, but in practice it leads to the
> issue e.g. inverse functional properties and reasoners are needed to
> consolidate the data, where maybe a best practice / convention would
> ease things.


By having different URIs for the same endpoints across datasets we end
up requiring the use of IFP semantics for identity resolution, thus we
still need an IFP relation in your dataset bearing in mind that
sd:endpoint has range owl:DataTypeProperty  -- which makes it unsuitable
for IFP reasoning based identity reconciliation.

Kingsley

>
>
> Cheers,
>
> Claus
>
>
> On 17.03.20 18:04, Kingsley Idehen wrote:
>> On 3/17/20 12:22 PM, Claus Stadler wrote:
>>> Hi Masahide,
>>>
>>> For now I have manually added it by request and its here:
>>>
>>> https://github.com/SmartDataAnalytics/lodservatory/blob/master/latest-status.ttl#L423
>>>
>>>
>>>
>>>
>>> Note, that the service is still a moving target, so the vocabulary is
>>> still going to change (HTTP-in-RDF was suggested), and I yet need to
>>> include any missing endpoints from the so far 258 endpoints (possibly
>>> excluding obviously dead ones) of OpenLink's
>>> LODCloud_SPARQL_Endpoints.ttl dataset.
>>>
>>> Also as OpenLink has pretty much already set up the community process
>>> for adding endpoints, I would just build on it and download OpenLink's
>>> endpoint dataset as part of the service monitoring workflow.
>>>
>>> Also, after some time, e.g. a year or so, the service status updates
>>> may cause the git repo to grow to a size where its better to switch to
>>> a new ne anyway - so its better to keep the static and dynamic data in
>>> separate repos anyway.
>>>
>>>
>>> Cheers,
>>>
>>> Claus
>>
>> Hi Claus,
>>
>> I would suggest the inclusion of an IFP (owl:InverseFunctionalProperty)
>> in your dataset to simplify messing with other datasets (e.g. ours).
>> Ditto a type assertion e.g. sd:Service.
>>
>> ## Turtle Start ##
>>
>> @prefix eg:    <http://www.example.org/> .
>>
>> <http://de.dbpedia.org/sparql#service>
>>          eg:serviceStatus      "online" ;
>>          eg:serviceStatusTime
>> "2020-03-17T16:01:43.165+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
>>
>> .
>>
>> ## Turtle End ##
>>
>>
>> Becomes
>>
>> ## Turtle Start ##
>>
>> @prefix eg:    <http://www.example.org/> .
>> @prefix schema: <http://schema.org/> .
>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>> @prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
>>
>> <http://de.dbpedia.org/sparql#service>
>>          a sd:Service ;
>>          schema:url <http://de.dbpedia.org/sparql> ;   ## can be
>> designated as an IFP by a reasoner etc..
>>          eg:serviceStatus      "online" ;
>>          eg:serviceStatusTime
>> "2020-03-17T16:01:43.165+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
>>
>> .
>>
>> ## Turtle End ##
>>
>>
>> Kingsley
>>
>>>
>>> On 17.03.20 16:43, KANZAKI Masahide wrote:
>>>> Hello Claus, thank you for the great effort.
>>>>
>>>> The Japan Search endpoint [1], which is not listed as "online", was
>>>> suspended temporarily for system maintenance on March 17. It would be
>>>> grateful if you kindly re-examine the service and add it to the
>>>> "online" list.
>>>>
>>>> Thank you and best regards,
>>>>
>>>> [1] https://jpsearch.go.jp/rdf/sparql
>>>>
>>>> 2020年3月17日(火) 23:19 Claus Stadler
>>>> <cstadler@informatik.uni-leipzig.de>:
>>>>> Hi all,
>>>>>
>>>>>
>>>>> Last week I got a list of endpoints from one colleague and it turned
>>>>> out that many were dead so I hacked up a little service check [1]
>>>>> with Github-Actions (and our own sparql-integrate tool[2]) . And
>>>>> then another colleague told my of the rather recent effort that
>>>>> happened here.
>>>>>
>>>>>
>>>>> Openlink's LODCloud_SPARQL_Endpoints.ttl [3] looks looks very good!
>>>>>
>>>>>
>>>>> What I can contribute is a proof-of-concept of a completely
>>>>> self-contained SPARQL-based service monitoring setup [1] where
>>>>> Github does all the work of service checking, where everyone can
>>>>> clone the repo, adjust the queries and get whatever RDF out what
>>>>> they use case(s) demand. Right now I only added online/offline
>>>>> checking, but in principle one can also just query the endpoints for
>>>>> features and emit service descriptions or query for dataset metrics
>>>>> and emit void.
>>>>>
>>>>>
>>>>> Of course there are resource limitations, Github only allows 1000
>>>>> requests per hour, the workflow time is limited, and the virtual
>>>>> instances only have 2 CPUs.
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Claus
>>>>>
>>>>>
>>>>> [1]
>>>>> https://github.com/SmartDataAnalytics/lodservatory/blob/master/latest-status.ttl
>>>>>
>>>>>
>>>>> [2] https://github.com/SmartDataAnalytics/SparqlIntegrate
>>>>>
>>>>> [3]
>>>>> https://github.com/OpenLinkSoftware/general-turtle-doc-collection/blob/master/LODCloud_SPARQL_Endpoints.ttl
>>>>>
>>>>>
>>>>>
>>>>> On 24.12.19 16:12, Kingsley Idehen wrote:
>>>>>
>>>>> On 12/23/19 11:55 AM, Kingsley Idehen wrote:
>>>>>
>>>>> On 12/23/19 5:01 AM, Michel Dumontier wrote:
>>>>>
>>>>> Hi Kingsley,
>>>>>    Is it correct that we should continue to make changes to the
>>>>> spreadsheet, or should we do do a pull request against the turtle
>>>>> file?
>>>>>
>>>>>
>>>>> Whichever works best for you :)
>>>>>
>>>>>
>>>>> If the former, how often will you update the turtle document and the
>>>>> endpoint?
>>>>>
>>>>>
>>>>> We will update it frequently in response to edit contributions to
>>>>> either data source.
>>>>>
>>>>> Kingsley
>>>>>
>>>>>
>>>>> m.
>>>>>
>>>>> On Fri, Dec 20, 2019 at 10:36 PM Kingsley Idehen
>>>>> <kidehen@openlinksw.com> wrote:
>>>>>> On 12/18/19 2:31 PM, Kingsley Idehen wrote:
>>>>>>
>>>>>> On 9/17/19 6:10 PM, Kingsley Idehen wrote:
>>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> As part of the LODCloud effort (starting in 2007), a number of
>>>>>> SPARQL Endpoints emerged around the initial SPARQL endpoint
>>>>>> provided by DBpedia. Today, that cloud has grown into the largest
>>>>>> Knowledge Graph on earth (by far!) and continues to drive new
>>>>>> frontiers related to Artificial Intelligence and Machine Learning.
>>>>>>
>>>>>> Having established itself as the preeminent global Knowledge Graph
>>>>>> on earth, it is extremely important that we maintain an active list
>>>>>> of SPARQL endpoints using practices that scale. Thus, we are
>>>>>> providing a shared Google Spreadsheet for crowd-sourcing the
>>>>>> maintenance of SPARQL endpoints that make up this important
>>>>>> Knowledge Graph.
>>>>>>
>>>>>> Please contribute your SPARQL endpoint(s) to the spreadsheet.
>>>>>>
>>>>>> Links
>>>>>>
>>>>>> SPARQL Endpoint Google Spreadsheet 1
>>>>>> What is the LODCloud, and why is it important?
>>>>>>
>>>>>>
>>>>>> Season's Greetings to all,
>>>>>>
>>>>>> This is a final call regarding contributions to the SPARQL Query
>>>>>> Service Endpoint Description effort that we are seeding via a
>>>>>> shared Google Spreadsheet [1].
>>>>>>
>>>>>> The goal is to produce an RDF-Turtle document that describes these
>>>>>> endpoints using terms from the SPARQL Service Description [2] and
>>>>>> VoID [3] Ontologies. Naturally, the document will also be published
>>>>>> using Linked Data principles.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
>>>>>>
>>>>>>
>>>>>> [2] https://www.w3.org/ns/sparql-service-description
>>>>>>
>>>>>> [3] http://rdfs.org/ns/void#
>>>>>>
>>>>>>
>>>>>> We've published an RDF-Turtle document that describes a collection
>>>>>> of SPARQL Query Services Endpoints to our Github repository [1].
>>>>>> Naturally, content of said document has been deployed using Linked
>>>>>> Data principles [2] and sponged by our URIBurner Service [3].
>>>>>>
>>>>>> Enjoy!
>>>>>>
>>>>>> Links:
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/OpenLinkSoftware/general-turtle-doc-collection/blob/master/LODCloud_SPARQL_Endpoints.ttl
>>>>>>
>>>>>> -- Github
>>>>>>
>>>>>> [2] http://data.openlinksw.com/oplweb/sparql-endpoint134#this --
>>>>>> Example URI
>>>>>>
>>>>>> [3]
>>>>>> http://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fwww.openlinksw.com%2Fdata%2Fturtle%2Foplweb%2FLODCloud_SPARQL_Endpoints.ttl&distinct=1
>>>>>>
>>>>>> -- About the SPARQL Query Service Endpoints
>>>>>>
>>>>>> -- 
>>>>>> Regards,
>>>>>>
>>>>>> Kingsley Idehen
>>>>>> Founder & CEO
>>>>>> OpenLink Software
>>>>>> Home Page: http://www.openlinksw.com
>>>>>> Community Support: https://community.openlinksw.com
>>>>>> Weblogs (Blogs):
>>>>>> Company Blog: https://medium.com/openlink-software-blog
>>>>>> Virtuoso Blog: https://medium.com/virtuoso-blog
>>>>>> Data Access Drivers Blog:
>>>>>> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>>>>>
>>>>>> Personal Weblogs (Blogs):
>>>>>> Medium Blog: https://medium.com/@kidehen
>>>>>> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>>>>>>                 http://kidehen.blogspot.com
>>>>>>
>>>>>> Profile Pages:
>>>>>> Pinterest: https://www.pinterest.com/kidehen/
>>>>>> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
>>>>>> Twitter: https://twitter.com/kidehen
>>>>>> Google+: https://plus.google.com/+KingsleyIdehen/about
>>>>>> LinkedIn: http://www.linkedin.com/in/kidehen
>>>>>>
>>>>>> Web Identities (WebID):
>>>>>> Personal:
>>>>>> http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>>>>>>           :
>>>>>> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>>>>>
>>>>>>
>>>>> -- 
>>>>> Michel Dumontier
>>>>> Distinguished Professor of Data Science
>>>>> Maastricht University
>>>>> http://dumontierlab.com
>>>>>
>>>>>
>>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> The URI of the Github repo associated with the SPARQL Query Service
>>>>> endpoint descriptions has been changed [1]. Thus, use the new
>>>>> repository for branch forks and pull requests.
>>>>>
>>>>> [1] https://github.com/OpenLinkSoftware/lod-cloud
>>>>>
>>>>> Happy Holidays!
>>>>>
>>>>> -- 
>>>>> Regards,
>>>>>
>>>>> Kingsley Idehen
>>>>> Founder & CEO
>>>>> OpenLink Software
>>>>> Home Page: http://www.openlinksw.com
>>>>> Community Support: https://community.openlinksw.com
>>>>> Weblogs (Blogs):
>>>>> Company Blog: https://medium.com/openlink-software-blog
>>>>> Virtuoso Blog: https://medium.com/virtuoso-blog
>>>>> Data Access Drivers Blog:
>>>>> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>>>>
>>>>> Personal Weblogs (Blogs):
>>>>> Medium Blog: https://medium.com/@kidehen
>>>>> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>>>>>                 http://kidehen.blogspot.com
>>>>>
>>>>> Profile Pages:
>>>>> Pinterest: https://www.pinterest.com/kidehen/
>>>>> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
>>>>> Twitter: https://twitter.com/kidehen
>>>>> Google+: https://plus.google.com/+KingsleyIdehen/about
>>>>> LinkedIn: http://www.linkedin.com/in/kidehen
>>>>>
>>>>> Web Identities (WebID):
>>>>> Personal:
>>>>> http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>>>>>           :
>>>>> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>>>>
>>>>>
>>>>> -- 
>>>>> Dipl. Inf. Claus Stadler
>>>>> Department of Computer Science, University of Leipzig
>>>>> Research Group: http://aksw.org/
>>>>> Workpage & WebID: http://aksw.org/ClausStadler
>>>>> Phone: +49 341 97-32260
>>>>

-- 
Regards,

Kingsley Idehen	      
Founder & CEO 
OpenLink Software   
Home Page: http://www.openlinksw.com
Community Support: https://community.openlinksw.com
Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog
Virtuoso Blog: https://medium.com/virtuoso-blog
Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

Personal Weblogs (Blogs):
Medium Blog: https://medium.com/@kidehen
Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
              http://kidehen.blogspot.com

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
        : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this



Received on Tuesday, 17 March 2020 21:10:21 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 17 March 2020 21:10:22 UTC