Crunchbase Challenge

https://github.com/kg-construct/mapping-challenges/issues/14#issuecomment-1075435627

Here's a challenge to the KG Construction CG:

   - Take Crunchbase: 9.5M rows, across 17 tables, served as CSV, updated
   daily.
   - The data of some nodes comes from multiple tables (eg Organization
   from organizations, org_parents, org_descriptions)
   - RDFize and store the total dataset, in under 1-2 hours time
   - Update the data daily, replacing the data of recently updated rows, in
   under 1-2 hours time
   - Some values need special processing (see above link for details)
   - Do it with your favorite RDFization toolkit, and preferably do it
   declaratively (I can provide the models)

The previous comment at the above link outlines Ontotext's approach using
OntoRefine and generation of transformations from models.
-- 
Vladimir Alexiev, PhD, PMP
Chief Data Architect
Sirma AI, trading as Ontotext: https://www.ontotext.com, LinkedIn
<https://www.linkedin.com/company-beta/208070>, Twitter
<https://twitter.com/ontotext>, Rate GraphDB
<http://www.capterra.com/database-management-software/reviews/157533/Graph%20DB/Ontotext/new>
Email: vladimir.alexiev@ontotext.com, skype:valexiev1
Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net
Calendar:
https://www.google.com/calendar/embed?src=vladimir.alexiev@ontotext.com
Publications and CV: https://github.com/VladimirAlexiev/my

Received on Tuesday, 22 March 2022 17:53:50 UTC