W3C home > Mailing lists > Public > public-lod@w3.org > June 2016

DBpedia citations & references challenge

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Tue, 7 Jun 2016 12:51:32 +0300
Message-ID: <CA+u4+a1Xpi5JUKWUF7eP27WaUPS=ONK2gmaD7gg_b+8kbyZWBQ@mail.gmail.com>
To: "DBpedia Discussion (ML)" <dbpedia-discussion@lists.sourceforge.net>, "DBpedia Developers (ML)" <dbpedia-developers@lists.sourceforge.net>, Linked Data community <public-lod@w3.org>, "Discussion list for the Wikidata project." <wikidata@lists.wikimedia.org>
In the latest release (2015-10) DBpedia started exploring the citation and
reference data from Wikipedia and we were pleasantly surprised by the rich
data
<http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_citation_data_en.ttl.bz2>
we managed to extract.

   -

   citation_data_en.ttl.bz2
   <http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_data_en.ttl.bz2>
   (sample
   <http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_citation_data_en.ttl.bz2>
   )
   -

   citation_links_en.ttl.bz2
   <http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_links_en.ttl.bz2>
   (sample
   <http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_citation_links_en.ttl.bz2>
   )


This data holds huge potential, especially for the Wikidata challenge
of providing
a reference source for every statement. It describes not only a lot of
bibliographical data, but also a lot of web pages and many other sources
around the web.

The data we extract at the moment is quite raw and can be improved in many
different ways. Some of the potential improvements are:

   -

   Extend the citation extractor to handle other Wikipedia language editions
   <https://github.com/dbpedia/extraction-framework/issues/451>; currently
   only English Wikipedia is supported.
   -

   Map the data to a relevant Bibliographic ontology
   <https://github.com/dbpedia/mappings-tracker/issues/79> (there are many
   candidates and, although BIBO got most votes, we are open to other
   ontologies)
   -

   Map the data to existing Bibliographic LOD (eg TEL has 100M records,
   Worldcat 300M) or online books (eg Google Books). See the citationIri
   issue <https://github.com/dbpedia/extraction-framework/issues/452>.
   -

   Ways to merge / fuse identical citations from multiple articles
   -

   Use the citation data in the Wikidata primary sources tool
   <https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool>
   -

   Surprise us with your ideas!


We welcome contributions that improve the existing citation dataset in any
way; and we are open to collaboration and helping. Results will be
presented at the next DBpedia meeting: 15 September 2016 in Leipzig,
co-located with SEMANTiCS 2016. Each participant should submit a short
description of his/her contribution by Monday 12 September 2016 and present
his/her work at the meeting. Comments, questions can be posted on the
DBpedia discussion & developer lists or in our new DBpedia ideas page
<http://wiki.dbpedia.org/ideas/idea/261/dbpedia-citations-reference-challenge/>
.

Submissions will be judged by the Organizing Committee and the best two
will receive a prize.

Organizing Committee

   -

   Vladimir Alexiev, Ontotext and DBpedia BG
   -

   Anastasia Dimou, Ghent University, iMinds
   - Dimitris Kontokostas, KILT/AKSW, DBpedia Association



-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org,
http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas
Research Group: AKSW/KILT http://aksw.org/Groups/KILT
Received on Tuesday, 7 June 2016 09:52:27 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 7 June 2016 09:52:28 UTC