W3C home > Mailing lists > Public > semantic-web@w3.org > October 2021

[ANN] New DBpedia Snapshot 2021-09

From: DBpedia <pr-aksw@informatik.uni-leipzig.de>
Date: Mon, 25 Oct 2021 10:19:37 +0200
To: semantic-web@w3.org
Message-ID: <6d07f854-9b9d-a094-c724-7bec55f01fcf@informatik.uni-leipzig.de>
Apologies for cross-posting. The full release description including 
further statistics can be found on 
https://www.dbpedia.org/blog/snapshot-2021-09-release/ 
<https://www.dbpedia.org/blog/snapshot-2021-09-release/>.


We are pleased to announce immediate availability of a new edition of 
the free and publicly accessible SPARQL Query Service Endpoint and 
Linked Data Pages, for interacting with the new Snapshot Dataset.


    News since DBpedia Snapshot 2021-06
    <https://www.dbpedia.org/blog/snapshot-2021-06-release/>

  *


        Release notes are now maintained in the Databus Collection
        (https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09
        <https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09>)

  *


        Image and Abstract Extractor was improved

  *


        Work in progress: Smoothing the community issue reporting and
        fixing at Github
        (https://github.com/dbpedia/extraction-framework/issues/new/choose
        <https://github.com/dbpedia/extraction-framework/issues/new/choose>)


    What is the “DBpedia Snapshot” Release?

Historically, this release has been associated with many names: "DBpedia 
Core", "EN DBpedia", and — most confusingly — just "DBpedia". In fact, 
it is a combination of —

  *

    EN Wikipedia data— A small, but very useful, subset (~ 1 Billion
    triples or 14%) of the whole DBpedia extraction
    <https://link.springer.com/chapter/10.1007/978-3-030-59833-4_1>using
    theDBpedia Information Extraction Framework
    <https://github.com/dbpedia/extraction-framework>(DIEF), comprising
    structured information extracted from the English Wikipedia plus
    some enrichments from other Wikipedia language editions, notably
    multilingual abstracts in ar, ca, cs, de, el, eo, es, eu, fr, ga,
    id, it, ja, ko, nl, pl, pt, sv, uk, ru, zh.

  *

    Links— 62 million community-contributed cross-references and
    owl:sameAs links to other linked data sets on the Linked Open Data
    (LOD) Cloud that allow to effectively find and retrieve further
    information from the largest, decentral, change-sensitive knowledge
    graph on earth that has formed around DBpedia since 2007.

  *

    Community extensions— Community-contributed extensions such as
    additional ontologies and taxonomies.


    Release Frequency & Schedule

Going forward, releases will be scheduled for the 15th of February, May, 
July, and October (with +/- 5 days tolerance), and are named using the 
same date convention as the Wikipedia Dumps that served as the basis for 
the release. An example of the release timeline is shown below:


September 6–8

	

Sep 8–20

	

Sep 20–Oct 10

	

Oct 10–20

Wikipedia dumps for June 1 become available on 
https://dumps.wikimedia.org/ <https://dumps.wikimedia.org/>

	

Download and extraction with DIEF

	

Post-processing and quality-control period

	

Linked Data and SPARQL endpoint deployment


    Data Freshness

Given the timeline above, the EN Wikipediadata of DBpedia Snapshot has a 
lag of 1-4 months.


    Further Information

Growth of DBpedia, breakdown of links by domain, download instructions 
and some tips on how to effectively work with DBpedia are published as 
part of this blog post: 
https://www.dbpedia.org/blog/snapshot-2021-09-release/ 
<https://www.dbpedia.org/blog/snapshot-2021-09-release/>


Stay tuned and stay safe!

With kind regards,


The DBpedia Association
Received on Monday, 25 October 2021 08:35:24 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:42:17 UTC