W3C home > Mailing lists > Public > semantic-web@w3.org > May 2016

[ANN] Wikipedia Tools for Google Spreadsheets

From: Thomas Steiner <tomac@google.com>
Date: Mon, 2 May 2016 10:08:34 +0200
Message-ID: <CALgRrLne6L2fdyeFUsjjqK9w-k+iTMBHbiJVLitSoCc2DN+UtA@mail.gmail.com>
To: Thomas Steiner <tomac@google.com>
Cc: Research into Wikimedia content and communities <wiki-research-l@lists.wikimedia.org>, "Discussion list for the Wikidata project." <wikidata@lists.wikimedia.org>, "public-lod@w3.org" <public-lod@w3.org>, Semantic Web <semantic-web@w3.org>
Esteemed Wikipedia, Wikidata, Linked Data, and Semantic Web communities[*],

tl;dr: Released a Google Spreadsheets add-on called Wikipedia Tools
[1] that makes working with data from Wikipedia and Wikidata a breeze.

I am happy to release a Google Spreadsheets add-on called Wikipedia
Tools [1]. This add-on allows you to work with data from Wikipedia and
Wikidata from within a spreadsheet context using custom formulas. Let
me motivate the tools with a short example:

You may have heard of Volkswagen's #DieselGate scandal. Is this still
a problem for Volkswagen—and if so, where? Google Trends to the
rescue? Maybe [2]. But what about global impact? How do people in
Korea, an important Volkswagen export market [citation needed😉],
refer to the scandal? Turns out they call it 폭스바겐 배기가스 조작 (among
probably other options).

With a custom function from Wikipedia Tools, we can safely "translate"
from one English (a language that, for the sake of this example, we
assume we dominate well enough) Wikipedia article to many other
languages (that we do not necessarily dominate):

  bg Афера на Фолксваген
  cs Dieselgate
  de VW-Abgasskandal
  zh 福斯集團汽車舞弊事件

Then, using Wikipedia page views as one (among others) reasonable
popularity indicator, for each of these language results, for example
for Korean, we can get =WIKIPAGEVIEWS("ko:폭스바겐 배기가스 조작") for the last
n days, and plot the results [3] (in practice, you would probably
still normalize by size and/or total views of the particular

There are a lot more custom functions implemented than I could cover
in this short example. I have put together a slide deck [4] and paper
[5] that go into more detail if you are interested, a demo with all
functions is available at [6]. The add-on also has a built-in manual
(in Google Sheets, click Add-ons→Wikipedia Tools→Show documentation)
and its underlying code is open-source [7].

Please let me know in case of any open question, feature request, or
bug. Thanks!


[1] http://bit.ly/wikipedia-tools-add-on
[2] http://www.google.com/trends/explore?hl=en-US&q=volkswagen+emissions+scandal,+dieselgate&date=today+12-m
[3] https://docs.google.com/spreadsheets/d/1PyFq59iEeLWpPQrWDUyU8mlmQrb4GDv2QElmEU9aFec/edit?usp=sharing
[4] bit.ly/wikipedia-tools-slides
[5] bit.ly/wikipedia-tools-paper (PDF)
[6] https://docs.google.com/spreadsheets/d/1sVduZul787O-bRzuy0UKpRl7bkouxwaIOsxXuJGm6yg/edit?usp=sharing
[7] https://github.com/tomayac/wikipedia-tools-for-google-spreadsheets/
[*] Cross-posted on purpose
please choose your reply options accordingly.
[**] This is a simple example for illustrative purposes, I do _not_
claim it is an accurate popularity prediction, nor do I mean to bash

Dr. Thomas Steiner, Employee (http://blog.tomayac.com,

Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle
Registration office and registration number: Hamburg, HRB 86891

Version: GnuPG v2.0.29 (GNU/Linux)

Received on Monday, 2 May 2016 08:09:30 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 25 May 2016 10:59:46 UTC