W3C home > Mailing lists > Public > semantic-web@w3.org > April 2018

KIT releases monumental dataset of more than 15 *trillion* triples

From: Denny Vrandečić <vrandecic@gmail.com>
Date: Sun, 01 Apr 2018 17:31:00 +0000
Message-ID: <CAJVtBfds40OJL90PCt6PLEHYpBRd7eYe_D1YOeJSy5qMGVmyrw@mail.gmail.com>
To: semantic-web@w3.org, "public-lod@w3.org" <public-lod@w3.org>
KIT is proud today to release an extension to an existing dataset, which
will increase the size of the dataset by a factor of more than 1000
<http://km.aifb.kit.edu/projects/numbers/web/n1000>. The widely cited Linked
Open Numbers <http://km.aifb.kit.edu/projects/numbers/> dataset (more than
30 <http://km.aifb.kit.edu/projects/numbers/web/n30> citations) has been
updated. Every single triple was regenerated, and even though the size has
been dramatically expanded, we remain confident in the quality of every
single triple.

http://km.aifb.kit.edu/projects/numbers/

It has been - on the data today - eight
<http://km.aifb.kit.edu/projects/numbers/web/n8> years since the original
publication of the Linked Open Numbers dataset. Today, we are proud to
announce to increase the size and thus utility of the dataset by three
<http://km.aifb.kit.edu/projects/numbers/web/n3> orders of magnitude.

The page has received a thorough remake, not only refreshing it optically
and updating it to display better on mobile devices, but also introducing a
number of new features:

* the previous limit to the first billion
<http://km.aifb.kit.edu/projects/numbers/web/n1000000000> natural numbers
has been lifted, since the page has in the meantime moved to a 64
<http://km.aifb.kit.edu/projects/numbers/web/n64> bit architecture. We
expanded the supported numbers to the first trillion natural numbers,
therefore creating 999 billion
<http://km.aifb.kit.edu/projects/numbers/web/n999000000000> new entities.

* all links to Wikipedia and DBpedia have been refreshed. In the eight
years since the original release, Wikipedia and DBpedia have in an effort
to catch up with Linked Open Numbers created new entities for numerous
numbers. We have updated the links to all of those.

* also links to Wikidata entities representing these numbers have been
created and added, extending the linkage between Linked Open Numbers and
the LOD cloud by thousands and thousand of new entities.

* the whole dataset is now published under the terms of the CC-0 license,
countering long years of discussion that resulted in fear, uncertainty, and
doubt. Now the Linked Open Numbers dataset is standing on a solid
grounding, joining other major datasets in choosing the perfect license for
data.

* we expanded the ontology and the dataset to also provide the digit sum of
the numbers, allowing new applications on top of that.

* we refreshed the links to Linked Data browsers. The original six
<http://km.aifb.kit.edu/projects/numbers/web/n6> browsers are all not
available anymore to allow to browse over the Linked Open Numbers dataset.
Therefore these links were all removed, and replaced with two
<http://km.aifb.kit.edu/projects/numbers/web/n2> current browsers.

* we also support the URI4 <http://km.aifb.kit.edu/projects/numbers/web/n4>URI
project and providing data about the Linked Open Numbers URIs in the URI4URI
<http://uri4uri.net/> scheme.

* the page has been updated to support Unicode's UTF8
<http://km.aifb.kit.edu/projects/numbers/web/n8>, thus showing the number
names in their new full glory.

Eight <http://km.aifb.kit.edu/projects/numbers/web/n8> years - 2922
<http://km.aifb.kit.edu/projects/numbers/web/n2922> days - after the
original publication Linked Open Numbers still gets tens of thousand
<http://km.aifb.kit.edu/projects/numbers/web/n40000> hits per month. We are
happy to have updated the resource and expanded its lifetime considerably.

The community is invited and challenged to provide a SPARQL endpoint to the
dataset. We think that the size of the dataset would provide for an
interesting challenge.

An open source release of the code base is being planned.

The update was created in collaboration by Denny Vrandecic, Steffen Thoma,
Andreas Thalhammer, Andreas Harth, and York Sure-Vetter.
Received on Sunday, 1 April 2018 17:31:46 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:51:01 UTC