- From: Thomas Steiner <tomac@google.com>
- Date: Fri, 22 Apr 2016 08:59:00 +0200
- To: Thomas Steiner <tomac@google.com>
- Message-ID: <CALgRrL=pcOv=NEFKHi00-fO==JCMEPn5TL-Cu9zivRQFBsUgNA@mail.gmail.com>
[bcc: semantic-web[AT]w3.org, www2016[AT]easychair.org, contact[AT]iw3c2.org ] Esteemed reader, I have attended the World Wide Web conference (WWW 2016) in Montréal, Canada, and wanted to share my learnings in form of a trip report. You can read it online on my blog at http://blog.tomayac.com/2016/04/22/world-wide-web-conference-www2016-trip-report-004735 with fancy embedded tweets and slides and stuff, or simply pasted inline below as plain HTML. Thanks, Tom Last week, I attended the 25th International World Wide Web Conference ( WWW2016 <http://www2016.ca/>) that took place from April 11 to 15, 2016 in Montréal, Canada. The main proceedings <http://www2016.net/proceedings/forms/proceedings.htm> and the companion proceedings <http://www2016.net/proceedings/forms/companion.htm> are both available online. Google was one of the gold sponsors and Google Director of Research Peter Norvig <http://norvig.com/> delivered one of the main keynotes <http://www2016.ca/keynote-speakers.html>. This is my trip report with personal highlights and observations. *Workshops, Day 1* I started the conference with the Making Sense of Microposts <http://microposts2016.seas.upenn.edu/> workshop that began with an invited talk by Yahoo Research Scientist Mihajlo Grbovic <http://astro.temple.edu/~tua95067/> on Leveraging Blogging Activity on Tumblr to Infer Demographics and Interests of Users for Advertising Purposes <https://www.dropbox.com/s/sz7zidj7ow1b3v6/paper_15.pdf?dl=1>. As a ground truth for their gender prediction they have used US Census data on popular baby names <https://www.ssa.gov/oact/babynames/limits.html> and for *female* reached a precision of 0.806 (recall 0.838) and for *male* a precision of 0.794 (recall 0.689). I spent the rest of the day with session hopping between the Microposts workshop and the Computational Social Science for the Web <http://www2016.net/proceedings/companion/p1037.pdf> tutorial. @miha_jlo <https://twitter.com/miha_jlo> on predicting gender of Tumblr users #microposts2016 <https://twitter.com/hashtag/microposts2016?src=hash> pic.twitter.com/D6ClmCo4uJ <https://t.co/D6ClmCo4uJ> — Katrin Weller (@kwelle) April 11, 2016 <https://twitter.com/kwelle/status/719523317330354176> *Workshops, Day 2* My second day was fully dedicated to the Wiki Workshop <http://snap.stanford.edu/wikiworkshop2016/> that started with surprise guest and Wikipedia co-founder Jimmy Wales <https://en.wikipedia.org/wiki/Jimmy_Wales>, which led to a short discussion of, among other topics, payment and reward models for authors on Wikipedia <https://www.wikipedia.org/> and Wikia <http://www.wikia.com/>. Surprise guest @jimmy_wales <https://twitter.com/jimmy_wales> at #WikiWorkshop2016 <https://twitter.com/hashtag/WikiWorkshop2016?src=hash>. Looking forward to a great workshop. #WWW2016 <https://twitter.com/hashtag/WWW2016?src=hash> pic.twitter.com/w6Qlc21lon <https://t.co/w6Qlc21lon> — Thomas Steiner (@tomayac) April 12, 2016 <https://twitter.com/tomayac/status/719887250822197249> The workshop had an interesting concept of invited talks <http://snap.stanford.edu/wikiworkshop2016/#speakers-www> that filled the day, and the actual papers being presented at a poster session during lunch. I want to highlight the paper With a Little Help from my Neighbors: Person Name Linking Using the Wikipedia Social Network <http://www2016.net/proceedings/companion/p985.pdf> by J. Geiß and M. Gertz on named entity linking and disambiguation based on their co-occurrence in Wikipedia pages, and the paper Finding Structure in Wikipedia Edit Activity: An Information Cascade Approach <http://www2016.net/proceedings/companion/p1007.pdf> by R. Tinati *et al*. My own paper Wikipedia Tools for Google Spreadsheets <http://www2016.net/proceedings/companion/p997.pdf> introduces a Google Spreadsheets add-on <https://chrome.google.com/webstore/detail/wikipedia-tools/aiilcelhmpllcgkhhpifagfehbddkdfp?utm_source=permalink> that facilitates working with data from Wikipedia and Wikidata from within a spreadsheet context. The Wikipedia Tools for Google Spreadsheets are now available as an official Sheets Add-On: https://t.co/r6VGiSfT30. pic.twitter.com/MKAEozdE4T <https://t.co/MKAEozdE4T> — Thomas Steiner (@tomayac) February 15, 2016 <https://twitter.com/tomayac/status/699317264684867586> My invited talk at the workshop covered The Wiki(pedia|data) Edit Streams Firehose <http://bit.ly/wiki-firehose>, which you can see visualized and audiolized in my Wikipedia Screensaver <http://tomayac.github.io/wikipedia-screensaver/> that I have developed for the talk and released as open source <https://github.com/tomayac/wikipedia-screensaver>. *Main Conference, Day 1* The main conference started with a keynote by Sir Tim Berners-Lee <https://www.w3.org/People/Berners-Lee/> whose talk touched on the topic of mobile Web apps—which he prefers over native apps, because when [one goes] native, [one] become[s] part of a value chain—and that Web apps need to get closer to the capabilites of native apps (he did not mention Service Worker <https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API> specifically, but it was *somewhat* clear from the context that he was aiming at this API). After the keynote, I saw the presentation of a paper titled Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces <http://www2016.net/proceedings/proceedings/p51.pdf> on an approach to leverage cross-platform user profiles for news and event recommendations. The authors' demo <http://newsfie.org/> worked very well when I tested it with my YouTube <https://www.youtube.com/user/tomayac> and Twitter <https://twitter.com/tomayac> accounts. Next, I learned how the team at YouTube deal with spammy comments by analyzing the temporal graph based on the engagement behavior pattern between users and videos from the paper presentation of In a World That Counts: Clustering and Detecting Fake Social Engagement at Scale <http://www2016.net/proceedings/proceedings/p111.pdf>. An interesting idea to prevent online trackers from tracking personally identifiable information was shown in the paper Tracking the Trackers <http://www2016.net/proceedings/proceedings/p121.pdf> by the makers of the Web browser CLIQZ <https://cliqz.com/en>. Their approach leverages concepts from k-anonymity <http://dataprivacylab.org/dataprivacy/projects/kanonymity/kanonymity.pdf> by—rather than working with fixed block lists—having users collectively identify unsafe tracking elements in the background that have the potential to uniquely identify individual users, and by then removing such information from tracking requests. The paper Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work? <http://www2016.net/proceedings/proceedings/p133.pdf> tackles the issue of lengthy and hard-to-read privacy policies and whether crowdsourcing their annotation can help. The authors come to the conclusion that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. A demo <https://explore.usableprivacy.org/> with annotated privacy policies shows some examples. >From the poster session, I especially liked Visual Positions of Links and Clicks on Wikipedia <http://www2016.net/proceedings/companion/p27.pdf> that looked at the visual positions of clicked links on Wikipedia based on the Wikipedia clickstream dataset <https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream> and Travel the World: Analyzing and Predicting Booking Behavior using E-Mail Travel Receipts <http://www2016.net/proceedings/companion/p31.pdf> that examined more than 25 million travel receipts from Yahoo Mail users to predict their booking behavior. Two of the many #WWW2016 <https://twitter.com/hashtag/WWW2016?src=hash> posters that I found interesting: https://t.co/NrAJ7YNP7o, https://t.co/yaScB1ar8B [PDFs]. pic.twitter.com/vzmmJFZsZZ <https://t.co/vzmmJFZsZZ> — Thomas Steiner (@tomayac) April 21, 2016 <https://twitter.com/tomayac/status/723080565495201796> *Main Conference, Day 2* Day 2 started with a keynote by Mary Ellen Zurko <https://twitter.com/mzurko>, Principal Engineer at Cisco Systems, in which she provided a tour down memory lane through security from S-HTTP <https://de.wikipedia.org/wiki/S-HTTP> to Experimenting At Scale With Google Chrome's SSL Warning <http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41927.pdf> . >From the research track, I first want to highlight a Yahoo Labs Research paper on Predicting Pre-click Quality for Native Advertisements <http://www2016.net/proceedings/proceedings/p299.pdf>. Native ads are defined as a specific form of online advertising, where ads replicate the look-and-feel of their serving platform. The authors introduce the notion of *bad ads* that have a high Offensive Feedback Rate (OFR), *i.e.*, the relation between the number of times an ad was rated *offensive* and the number of impressions. According to the paper, the OFR metrics are more reliable than the commonly used click-through rate (CTR) metrics. One of my favorite papers of the conference was Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes <http://www2016.net/proceedings/proceedings/p591.pdf> (lighter reading: slides <https://commons.wikimedia.org/wiki/File:Wikipedia_Hoax_Detection_-_WMF_Nov_18_-_slides.pdf> , project homepage <https://cs.umd.edu/~srijan/hoax/>) that aims at identifying hoaxes on Wikipedia, *i.e.*, deliberately fabricated falsehood made to masquerade as truth. Some famous hoaxes <https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wikipedia> survived for more than nine years and were widely cited in the media. I continued with the presentation of our Industry Track paper From Freebase to Wikidata: The Great Migration <http://www2016.net/proceedings/proceedings/p1419.pdf>, in which we describe our ongoing data transfer project for migrating the (now shut-down <https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc>) structured knowledge base Freebase <http://www.freebase.com/> to Wikidata <https://www.wikidata.org/wiki/Wikidata:Main_Page>. We further report on the data mapping challenges, provide an analysis of the progress so far, and also describe the Primary Sources Tool <https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool> that aims to facilitate this—and future—data migrations. The tool has been released as open source <https://github.com/google/primarysources>. For me, the day ended with an interesting paper on The QWERTY Effect on the Web—How Typing Shapes the Meaning of Words in Online Human-Computer Interaction <http://www2016.net/proceedings/proceedings/p661.pdf>. I had never heard of the QWERTY effect <http://link.springer.com/article/10.3758%2Fs13423-012-0229-7> before, but it is based on the hypothesis that on average words typed with more letters from the right side of the keyboard are more positive in meaning than words typed with more letters from the left. According to the paper, there is some evidence that this hypothesis also holds true for the Web. *Main Conference, Day 3* In the paper Tell Me About Yourself: The Malicious CAPTCHA Attack <http://www2016.net/proceedings/proceedings/p999.pdf>, the authors show how fake CAPTCHAS <https://en.wikipedia.org/wiki/CAPTCHA> (*C*ompletely *A* utomated *P*ublic *T*uring tests to tell *C*omputers and *H*umans *A*part) can be used to trick users into unwillingly disclosing private information like one's Facebook name displayed in (social widget) iframes embedded in attack pages that do not have access to this private data due to the Same Origin Policy <https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy> by having users solve such fake CAPTCHAs consisting of many CSS-disguised iframes. Google runs a service called Safe Browsing <https://www.google.com/transparencyreport/safebrowsing/> that alerts users when websites get compromised. In the paper Remedying Web Hijacking: Notification Effectiveness and Webmaster Comprehension <http://www2016.net/proceedings/proceedings/p1009.pdf>, the authors provide a study that captures the life cycle of 760,935 hijacking incidents from July, 2014 to June, 2015, as identified by Google Safe Browsing and Search Quality. They observe that direct communication with webmasters increases the likelihood of cleanup by over 50% and reduces infection lengths by at least 62%. Another paper on Wikipedia looked at Growing Wikipedia Across Languages via Recommendation <http://www2016.net/proceedings/proceedings/p975.pdf> by detecting missing articles, ranking them by local importance, and finally contacting potential Wikipedia editors via email and suggesting them to write the article in question. The authors have deployed the Wikipedia GapFinder <http://recommend.wmflabs.org/#Recommend> that shows the appraoch in practice. *Other Observations* The Social Media Research Foundation <http://www.smrfoundation.org/> provides a NodeXL <http://nodexl.codeplex.com/>-based visualization of the network of tweets <https://nodexlgraphgallery.org/Pages/Graph.aspx?graphID=66550> that used the #WWW2016 <https://twitter.com/hashtag/www2016> hashtag, including all my #WWW2016 tweets <https://twitter.com/search?f=tweets&vertical=default&q=%23www2016%20from%3Atomayac&src=typd> . The tweet network around the #WWW2016 <https://twitter.com/hashtag/WWW2016?src=hash> hashtag visualized via NodeXL: https://t.co/cRJgj5zBsz. pic.twitter.com/q5XxLHIH74 <https://t.co/q5XxLHIH74> — Thomas Steiner (@tomayac) April 21, 2016 <https://twitter.com/tomayac/status/723267576881668096> One thing I noticed at the conference is that we (and I fully include myself here) from time to time still tend to unconsciously use stereotyped, gendered language where it is inadequate in the general case ("so easy my mom or grandma could use it", "to pass the 'mom test'", *etc.*). I called this out in a tweet <https://twitter.com/tomayac/status/720613067822903296>. You may want to follow the interesting conversation it has started on Twitter <https://twitter.com/tomayac/status/720613067822903296> or Facebook <https://www.facebook.com/Tomayac/posts/10153530286807286> (if you are friends with me). This tweet led Christopher Gutteridge to create the imaginative naive Web user Rube <http://blog.soton.ac.uk/webteam/2016/04/15/introducing-rube/>. Can we stop calling the cliché inexperienced Web user “our mom” or “our grandma”? #WWW2016 <https://twitter.com/hashtag/WWW2016?src=hash> — Thomas Steiner (@tomayac) April 14, 2016 <https://twitter.com/tomayac/status/720613067822903296> Oh, and in the old days, there used to be more bananas <http://tomayac.com/tweets/search?q=bananas+banana+bananamoment>… Next conference! -- Dr. Thomas Steiner, Employee (http://blog.tomayac.com, https://twitter.com/tomayac) Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle Registration office and registration number: Hamburg, HRB 86891 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.29 (GNU/Linux) iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom hTtPs://xKcd.cOm/1181/ <https://xkcd.com/1181/> -----END PGP SIGNATURE-----
Received on Friday, 22 April 2016 06:59:48 UTC