- From: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
- Date: Tue, 22 Mar 2022 19:52:25 +0200
- To: public-kg-construct@w3.org
- Cc: KGS <kgs@ontotext.com>
- Message-ID: <CAMv+wg6PxvSOLFeOorU4-DovVEcnOemArv6_ua-mntN9dqanww@mail.gmail.com>
https://github.com/kg-construct/mapping-challenges/issues/14#issuecomment-1075435627 Here's a challenge to the KG Construction CG: - Take Crunchbase: 9.5M rows, across 17 tables, served as CSV, updated daily. - The data of some nodes comes from multiple tables (eg Organization from organizations, org_parents, org_descriptions) - RDFize and store the total dataset, in under 1-2 hours time - Update the data daily, replacing the data of recently updated rows, in under 1-2 hours time - Some values need special processing (see above link for details) - Do it with your favorite RDFization toolkit, and preferably do it declaratively (I can provide the models) The previous comment at the above link outlines Ontotext's approach using OntoRefine and generation of transformations from models. -- Vladimir Alexiev, PhD, PMP Chief Data Architect Sirma AI, trading as Ontotext: https://www.ontotext.com, LinkedIn <https://www.linkedin.com/company-beta/208070>, Twitter <https://twitter.com/ontotext>, Rate GraphDB <http://www.capterra.com/database-management-software/reviews/157533/Graph%20DB/Ontotext/new> Email: vladimir.alexiev@ontotext.com, skype:valexiev1 Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net Calendar: https://www.google.com/calendar/embed?src=vladimir.alexiev@ontotext.com Publications and CV: https://github.com/VladimirAlexiev/my
Received on Tuesday, 22 March 2022 17:53:50 UTC