W3C home > Mailing lists > Public > public-lod@w3.org > April 2010

Re: KIT releases 14 billion triples to the Linked Open Data cloud

From: Matthias Samwald <samwald@gmx.at>
Date: Thu, 1 Apr 2010 12:30:13 +0200
Message-ID: <1C6199ECEA224D3C9BCD42884F489E94@ms>
To: "Denny Vrandecic" <denny.vrandecic@kit.edu>, <public-lod@w3.org>
Hi Denny,

I am sorry, but I have to voice some criticism of this project. Over the 
past two years, I have become increasingly wary of the excitement over large 
numbers of triples in the LOD community. Large numbers of triples don't mean 
don't necessarily mean that a dataset enables us to do anything novel or 
significantly useful. I think there should be a shift from focusing on 
quantity to focusing on quality and usefulness.

Now the project you describe seems to be well-made, but it also exemplifies 
this problem to a degree that I have not seen before. You basically 
published a huge dataset of numbers, for the sake of producing a large 
number of triples. Your announcement mainly emphasis on how huge the dataset 
is, and the corresponding paper does the same. The paper gives a few 
application scenarios, I quote

"The added value of the paradigm shift initiated by our work cannot be 
underestimated.
By endowing numbers with an own identity, the linked open data cloud
will become treasure trove for a variety of disciplines. By using elaborate 
data
mining techniques, groundbreaking insights about deep mathematical 
correspondences
can be obtained. As an example, using our sample dataset, we were able
to discover that there are signicantly more odd primes than even ones, and
even more excitingly a number contains 2 as a prime factor exactly if its
successor does not."

I am sorry, but this  sounds a bit overenthusiastic. I see no paradigm 
shift, and I also don't see why your findings about prime numbers required 
you to publish the dataset as linked data. I also have troubles seeing the 
practical value of looking at the resource pages for each number with a 
linked data browser, but I am also not a mathematician.

I am sorry for being a bit antagonistic, but we as a community should really 
try not to be seduced too easily by publishing ever-larger numbers of 
triples.

Cheers,
Matthias Samwald




--------------------------------------------------
From: "Denny Vrandecic" <denny.vrandecic@kit.edu>
Sent: Thursday, April 01, 2010 12:01 PM
To: <public-lod@w3.org>
Subject: KIT releases 14 billion triples to the Linked Open Data cloud

> We are happy to announce that the Institute AIFB at the KIT is releasing 
> the biggest dataset until now to the Linked Open Data cloud. The Linked 
> Open Numbers project offers billions of facts about natural numbers, all 
> readily available as Linked Data.
>
> Our accompanying peer-reviewed paper [1] gives further details on the 
> background and implementation. We have integrated with external data 
> sources (linking DBpedia to all their 335 number entities) and also 
> directly link to the best-known linked open data browsers from the page.
>
> You can visit the Linked Open Numbers project at:
> <http://km.aifb.kit.edu/projects/numbers/>
>
> Or point your linked open data browser directly at:
> <http://km.aifb.kit.edu/projects/numbers/n1>
>
> We are happy to have increased the amount of triples on the Web by more 
> than 14 billion triples, roughly 87.5% of the size of linked data web 
> before this release (see paper for details). We hope that the data set 
> will find its serendipitous use.
>
> The data set and the publication mechanism was checked pedantically, and 
> we expect no errors in the triples. If you do find some, please let us 
> know. We intend to be compatible with all major linked open data 
> publication standards.
>
> About the AIFB
>
> The Institute AIFB (Applied Informatics and Formal Description Methods) at 
> KIT is one of the world-leading institutions in Semantic Web technology. 
> Approximately 20 researchers of the knowledge management research group 
> are establishing theoretical results and scalable implementations for the 
> field, closely collaborating with the sister institute KSRI (Karlsruhe 
> Service Research Institute), the start-up company ontoprise GmbH, and the 
> Knowledge Management group at the FZI Research Center for Information 
> Technologies. Particular emphasis is given to areas such as logical 
> foundations, Semantic Web mining, ontology creation engineering and 
> management, RDF data management, semantic web search, and the 
> implementation of interfaces and tools. The institute is involved in many 
> industry-university co-operations, both on a European and a national 
> level, including a number of intelligent Web systems case studies.
>
> Website: <http://www.aifb.kit.edu>
>
> About KIT
>
> The Karlsruhe Institute of Technology (KIT) is the merger of the former 
> Universität Karlsruhe (TH) and the former Forschungszentrum Karlsruhe. 
> With about 8000 employees and an annual budget of 700 million Euros, KIT 
> is the largest technical research institution within Germany. KIT is both, 
> a state university with research and teaching and, at the same time, a 
> large-scale research institution of the Helmholtz Association. KIT has a 
> strong reputation as one of Germany’s university of excellence, aiming to 
> set the highest standards for education, research and innovation.
>
> Website: <http://www.kit.edu>
>
> [1] Denny Vrandecic, Markus Krötzsch, Sebastian Rudolph, Uta Lösch: 
> Leveraging Non-Lexical Knowledge for the Linked Open Data Web, published 
> in Rodolphe Héliot and Antoine Zimmermann (eds.), The Fifth RAFT'2010), 
> the yearly bilingual publication on nonchalant research, available at
> <http://km.aifb.kit.edu/projects/numbers/linked_open_numbers.pdf>= 
Received on Thursday, 1 April 2010 10:30:46 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:26 UTC