W3C home > Mailing lists > Public > semantic-web@w3.org > April 2016

World Wide Web Conference (WWW 2016)—Trip Report

From: Thomas Steiner <tomac@google.com>
Date: Fri, 22 Apr 2016 08:59:00 +0200
Message-ID: <CALgRrL=pcOv=NEFKHi00-fO==JCMEPn5TL-Cu9zivRQFBsUgNA@mail.gmail.com>
To: Thomas Steiner <tomac@google.com>
[bcc: semantic-web[AT]w3.org, www2016[AT]easychair.org, contact[AT]iw3c2.org

Esteemed reader,

I have attended the World Wide Web conference (WWW 2016) in Montréal,
Canada, and wanted to share my learnings in form of a trip report. You can
read it online on my blog at
fancy embedded tweets and slides and stuff, or simply pasted inline below
as plain HTML.


Last week, I attended the 25th International World Wide Web Conference (
WWW2016 <http://www2016.ca/>) that took place from April 11 to 15, 2016 in
Montréal, Canada. The main proceedings
<http://www2016.net/proceedings/forms/proceedings.htm> and the companion
proceedings <http://www2016.net/proceedings/forms/companion.htm> are both
available online. Google was one of the gold sponsors and Google Director
of Research Peter Norvig <http://norvig.com/> delivered one of the main
keynotes <http://www2016.ca/keynote-speakers.html>. This is my trip report
with personal highlights and observations.
*Workshops, Day 1*

I started the conference with the Making Sense of Microposts
<http://microposts2016.seas.upenn.edu/> workshop that began with an invited
talk by Yahoo Research Scientist Mihajlo Grbovic
<http://astro.temple.edu/~tua95067/> on Leveraging Blogging Activity on
Tumblr to Infer Demographics and Interests of Users for Advertising Purposes
<https://www.dropbox.com/s/sz7zidj7ow1b3v6/paper_15.pdf?dl=1>. As a ground
truth for their gender prediction they have used US Census data on popular
baby names <https://www.ssa.gov/oact/babynames/limits.html> and for
*female* reached
a precision of 0.806 (recall 0.838) and for *male* a precision of 0.794
(recall 0.689). I spent the rest of the day with session hopping between
the Microposts workshop and the Computational Social Science for the Web
<http://www2016.net/proceedings/companion/p1037.pdf> tutorial.

@miha_jlo <https://twitter.com/miha_jlo> on predicting gender of Tumblr
users #microposts2016 <https://twitter.com/hashtag/microposts2016?src=hash>
pic.twitter.com/D6ClmCo4uJ <https://t.co/D6ClmCo4uJ>
— Katrin Weller (@kwelle) April 11, 2016

*Workshops, Day 2*

My second day was fully dedicated to the Wiki Workshop
<http://snap.stanford.edu/wikiworkshop2016/> that started with surprise
guest and Wikipedia co-founder Jimmy Wales
<https://en.wikipedia.org/wiki/Jimmy_Wales>, which led to a short
discussion of, among other topics, payment and reward models for authors on
Wikipedia <https://www.wikipedia.org/> and Wikia <http://www.wikia.com/>.

Surprise guest @jimmy_wales <https://twitter.com/jimmy_wales> at
#WikiWorkshop2016 <https://twitter.com/hashtag/WikiWorkshop2016?src=hash>.
Looking forward to a great workshop. #WWW2016
<https://twitter.com/hashtag/WWW2016?src=hash> pic.twitter.com/w6Qlc21lon
— Thomas Steiner (@tomayac) April 12, 2016

The workshop had an interesting concept of invited talks
<http://snap.stanford.edu/wikiworkshop2016/#speakers-www> that filled the
day, and the actual papers being presented at a poster session during
lunch. I want to highlight the paper With a Little Help from my Neighbors:
Person Name Linking Using the Wikipedia Social Network
<http://www2016.net/proceedings/companion/p985.pdf> by J. Geiß and M. Gertz
on named entity linking and disambiguation based on their co-occurrence in
Wikipedia pages, and the paper Finding Structure in Wikipedia Edit
Activity: An Information Cascade Approach
<http://www2016.net/proceedings/companion/p1007.pdf> by R. Tinati *et al*.
My own paper Wikipedia Tools for Google Spreadsheets
<http://www2016.net/proceedings/companion/p997.pdf> introduces a Google
Spreadsheets add-on
facilitates working with data from Wikipedia and Wikidata from within a
spreadsheet context.

The Wikipedia Tools for Google Spreadsheets are now available as an
official Sheets Add-On: https://t.co/r6VGiSfT30. pic.twitter.com/MKAEozdE4T
— Thomas Steiner (@tomayac) February 15, 2016

My invited talk at the workshop covered The Wiki(pedia|data) Edit Streams
Firehose <http://bit.ly/wiki-firehose>, which you can see visualized and
audiolized in my Wikipedia Screensaver
<http://tomayac.github.io/wikipedia-screensaver/> that I have developed for
the talk and released as open source

*Main Conference, Day 1*

The main conference started with a keynote by Sir Tim Berners-Lee
<https://www.w3.org/People/Berners-Lee/> whose talk touched on the topic of
mobile Web apps—which he prefers over native apps, because when [one goes]
native, [one] become[s] part of a value chain—and that Web apps need to get
closer to the capabilites of native apps (he did not mention Service Worker
but it was *somewhat* clear from the context that he was aiming at this

After the keynote, I saw the presentation of a paper titled Immersive
Recommendation: News and Event Recommendations Using Personal Digital Traces
<http://www2016.net/proceedings/proceedings/p51.pdf> on an approach to
leverage cross-platform user profiles for news and event recommendations.
The authors' demo <http://newsfie.org/> worked very well when I tested it
with my YouTube <https://www.youtube.com/user/tomayac> and Twitter
<https://twitter.com/tomayac> accounts.

Next, I learned how the team at YouTube deal with spammy comments by
analyzing the temporal graph based on the engagement behavior pattern
between users and videos from the paper presentation of In a World That
Counts: Clustering and Detecting Fake Social Engagement at Scale

An interesting idea to prevent online trackers from tracking personally
identifiable information was shown in the paper Tracking the Trackers
<http://www2016.net/proceedings/proceedings/p121.pdf> by the makers of the
Web browser CLIQZ <https://cliqz.com/en>. Their approach leverages concepts
from k-anonymity
than working with fixed block lists—having users collectively identify
unsafe tracking elements in the background that have the potential to
uniquely identify individual users, and by then removing such information
from tracking requests.

The paper Crowdsourcing Annotations for Websites' Privacy Policies: Can It
Really Work? <http://www2016.net/proceedings/proceedings/p133.pdf> tackles
the issue of lengthy and hard-to-read privacy policies and whether
crowdsourcing their annotation can help. The authors come to the conclusion
that, if carefully deployed, crowdsourcing can indeed result in the
generation of non-trivial annotations and can also help identify elements
of ambiguity in policies. A demo <https://explore.usableprivacy.org/> with
annotated privacy policies shows some examples.

>From the poster session, I especially liked Visual Positions of Links and
Clicks on Wikipedia <http://www2016.net/proceedings/companion/p27.pdf> that
looked at the visual positions of clicked links on Wikipedia based on
the Wikipedia
clickstream dataset
<https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream> and Travel
the World: Analyzing and Predicting Booking Behavior using E-Mail Travel
Receipts <http://www2016.net/proceedings/companion/p31.pdf> that examined
more than 25 million travel receipts from Yahoo Mail users to predict their
booking behavior.

Two of the many #WWW2016 <https://twitter.com/hashtag/WWW2016?src=hash> posters
that I found interesting: https://t.co/NrAJ7YNP7o, https://t.co/yaScB1ar8B
 [PDFs]. pic.twitter.com/vzmmJFZsZZ <https://t.co/vzmmJFZsZZ>
— Thomas Steiner (@tomayac) April 21, 2016

*Main Conference, Day 2*

Day 2 started with a keynote by Mary Ellen Zurko
<https://twitter.com/mzurko>, Principal Engineer at Cisco Systems, in which
she provided a tour down memory lane through security from S-HTTP
<https://de.wikipedia.org/wiki/S-HTTP> to Experimenting At Scale With
Google Chrome's SSL Warning

>From the research track, I first want to highlight a Yahoo Labs Research paper
on Predicting Pre-click Quality for Native Advertisements
<http://www2016.net/proceedings/proceedings/p299.pdf>. Native ads are
defined as a specific form of online advertising, where ads replicate the
look-and-feel of their serving platform. The authors introduce the notion
of *bad ads* that have a high Offensive Feedback Rate (OFR), *i.e.*, the
relation between the number of times an ad was rated *offensive* and the
number of impressions. According to the paper, the OFR metrics are more
reliable than the commonly used click-through rate (CTR) metrics.

One of my favorite papers of the conference was Disinformation on the Web:
Impact, Characteristics, and Detection of Wikipedia Hoaxes
<http://www2016.net/proceedings/proceedings/p591.pdf> (lighter reading:
, project homepage <https://cs.umd.edu/~srijan/hoax/>) that aims at
identifying hoaxes on Wikipedia, *i.e.*, deliberately fabricated falsehood
made to masquerade as truth. Some famous hoaxes
<https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wikipedia> survived
for more than nine years and were widely cited in the media.
I continued with the presentation of our Industry Track paper From Freebase
to Wikidata: The Great Migration
<http://www2016.net/proceedings/proceedings/p1419.pdf>, in which we
describe our ongoing data transfer project for migrating the (now shut-down
structured knowledge base Freebase <http://www.freebase.com/> to Wikidata
<https://www.wikidata.org/wiki/Wikidata:Main_Page>. We further report on
the data mapping challenges, provide an analysis of the progress so far,
and also describe the Primary Sources Tool
<https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool> that aims to
facilitate this—and future—data migrations. The tool has been released as open
source <https://github.com/google/primarysources>.

For me, the day ended with an interesting paper on The QWERTY Effect on the
Web—How Typing Shapes the Meaning of Words in Online Human-Computer
Interaction <http://www2016.net/proceedings/proceedings/p661.pdf>. I had
never heard of the QWERTY effect
<http://link.springer.com/article/10.3758%2Fs13423-012-0229-7> before, but
it is based on the hypothesis that on average words typed with more letters
from the right side of the keyboard are more positive in meaning than words
typed with more letters from the left. According to the paper, there is
some evidence that this hypothesis also holds true for the Web.
*Main Conference, Day 3*

In the paper Tell Me About Yourself: The Malicious CAPTCHA Attack
<http://www2016.net/proceedings/proceedings/p999.pdf>, the authors show how
fake CAPTCHAS <https://en.wikipedia.org/wiki/CAPTCHA> (*C*ompletely *A*
utomated *P*ublic *T*uring tests to tell *C*omputers and *H*umans *A*part)
can be used to trick users into unwillingly disclosing private information
like one's Facebook name displayed in (social widget) iframes embedded in
attack pages that do not have access to this private data due to the Same
Origin Policy
<https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy> by
having users solve such fake CAPTCHAs consisting of many CSS-disguised

Google runs a service called Safe Browsing
<https://www.google.com/transparencyreport/safebrowsing/> that alerts users
when websites get compromised. In the paper Remedying Web Hijacking:
Notification Effectiveness and Webmaster Comprehension
<http://www2016.net/proceedings/proceedings/p1009.pdf>, the authors provide
a study that captures the life cycle of 760,935 hijacking incidents from
July, 2014 to June, 2015, as identified by Google Safe Browsing and Search
Quality. They observe that direct communication with webmasters increases
the likelihood of cleanup by over 50% and reduces infection lengths by at
least 62%.

Another paper on Wikipedia looked at Growing Wikipedia Across Languages via
Recommendation <http://www2016.net/proceedings/proceedings/p975.pdf> by
detecting missing articles, ranking them by local importance, and finally
contacting potential Wikipedia editors via email and suggesting them to
write the article in question. The authors have deployed the Wikipedia
GapFinder <http://recommend.wmflabs.org/#Recommend> that shows the appraoch
in practice.
*Other Observations*

The Social Media Research Foundation <http://www.smrfoundation.org/> provides
a NodeXL <http://nodexl.codeplex.com/>-based visualization of the network
of tweets <https://nodexlgraphgallery.org/Pages/Graph.aspx?graphID=66550> that
used the #WWW2016 <https://twitter.com/hashtag/www2016> hashtag, including
all my #WWW2016 tweets

The tweet network around the #WWW2016
<https://twitter.com/hashtag/WWW2016?src=hash> hashtag visualized via
NodeXL: https://t.co/cRJgj5zBsz. pic.twitter.com/q5XxLHIH74
— Thomas Steiner (@tomayac) April 21, 2016

One thing I noticed at the conference is that we (and I fully include
myself here) from time to time still tend to unconsciously use stereotyped,
gendered language where it is inadequate in the general case ("so easy my
mom or grandma could use it", "to pass the 'mom test'", *etc.*). I called
this out in a tweet <https://twitter.com/tomayac/status/720613067822903296>.
You may want to follow the interesting conversation it has started on
Twitter <https://twitter.com/tomayac/status/720613067822903296> or Facebook
<https://www.facebook.com/Tomayac/posts/10153530286807286> (if you are
friends with me). This tweet led Christopher Gutteridge to create the
imaginative naive Web user Rube

Can we stop calling the cliché inexperienced Web user “our mom” or “our
grandma”? #WWW2016 <https://twitter.com/hashtag/WWW2016?src=hash>
— Thomas Steiner (@tomayac) April 14, 2016

Oh, and in the old days, there used to be more bananas
<http://tomayac.com/tweets/search?q=bananas+banana+bananamoment>… Next

Dr. Thomas Steiner, Employee (http://blog.tomayac.com,

Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle
Registration office and registration number: Hamburg, HRB 86891

Version: GnuPG v2.0.29 (GNU/Linux)

hTtPs://xKcd.cOm/1181/ <https://xkcd.com/1181/>
Received on Friday, 22 April 2016 06:59:48 UTC

This archive was generated by hypermail 2.3.1 : Friday, 22 April 2016 06:59:57 UTC