W3C home > Mailing lists > Public > public-schemaorg@w3.org > November 2018

Re: Schema.org sameAs/Article Test on Wikipedia

From: Adam Baso <abaso@wikimedia.org>
Date: Tue, 27 Nov 2018 12:19:51 -0600
Message-ID: <CAB74=NpsvPKte_qB4JwE4OfVi+agE2F-+Fgn5rOS4BQnafDqFQ@mail.gmail.com>
To: Aaron Bradley <aaranged@gmail.com>
Cc: Olga Vasileva <ovasileva@wikimedia.org>, public-schemaorg@w3.org
Aaron, thanks again for the messages. Responses inline.

On Wed, Nov 21, 2018 at 3:01 PM Adam Baso <abaso@wikimedia.org> wrote:

> Hi Aaron - thanks for the messages. Just wanted to note receipt. We'll
> reply probably sometime next week, post the US Thanksgiving holiday weekend.
> Thanks again!
> -Adam
> On Tue, Nov 20, 2018 at 1:50 PM Aaron Bradley <aaranged@gmail.com> wrote:
>> And sorry for the serial thread, but a couple of follow-up questions that
>> arose after I poked around a bit more:
>>    - You say "we're A/B testing the sameAs property on Wikipedia
>>    articles having a corresponding Wikidata entry".  When there is no Wikidata
>>    entry are you omitting schema.org/Article markup altogether, or are
>>    you adding Article markup by omitting only the sameAs declaration?  If the
>>    former, this is a pretty critical difference.  That is, comparing the
>>    search relevance of an article marked up with Article (e.g.
>>    https://en.wikipedia.org/wiki/World_War_II) with one without Article
>>    markup (e.g. https://en.wikipedia.org/wiki/Violet_Friend) is a lot
>>    different than a narrow test of the use of sameAs.  If this is the case
>>    (comparison of an article with schema.org/Article, including sameAs,
>>    with an article without any Article markup at all) then I'm sure this would
>>    generate some interesting data to look at, but I'd be at a loss to see how
>>    you could assess the impact of sameAs in this case - a viable A/B here
>>    would of course require compariing articles with Article including the
>>    sameAs property and articles with Article by excluding the sameAs
>>    property.  But maybe I've just failed to find any instances of the latter
>>    (Article markup without sameAs).
You have it correct - when the relation with a Wikidata entity isn’t
available, then the sameAs declaration and Article type aren’t present, but
when the relation is available, then the sameAs declaration is nested in
the Article type. In earlier prototyping, there was a thought for doing a
more barebones sameAs relation, but Article as the type was arrived at for
a more representative modeling of the page.

It is worth noting that articles without associated Wikidata items are
expected to form a small minority on most of our projects (domains), and
will likely have much lower average traffic to begin with.

Point noted - it may be worth further testing the difference between
Article types with sameAs versus Article types without, although now that
the test is in force we’ll be awaiting results from this first test..

>>    - Where an article is encoded with Article and the sameAs property,
>>    are you in all cases (as per the examples I've been able to uncover) also
>>    providing this Wikidata URI as the value for the mainEntity property?
Yes. As you might have guessed, this URI is provided for semantic
processors for eventual content negotiation.

> Thanks again!
>> On Tue, Nov 20, 2018 at 11:24 AM Aaron Bradley <aaranged@gmail.com>
>> wrote:
>>> Very interesting Adam.  Curious is you can share with us any further
>>> detail regarding *specifically* what's being tested, and how results
>>> are being judged.
>>> That is, you say:
>>> "One motivation for this work is to explore the effect of the changes on
>>> search relevance. More generally, we're also interested in how structured
>>> markup like this on Wikimedia content projects might be beneficial to the
>>> semantic web and machine readability..."
>>> With regard to "the effect of the changes on search relevance", do you
>>> mean how the additional markup impacts the visibility of Wikipedia pages in
>>> enterprise search engines such as Google, Bing and Yandex, or are there
>>> other search environments implicated here?  And what is/are the measurement
>>> protocol/protocols you're using to assess "relevance" (e.g. changes in
>>> search engine ranking, changes in the way a Wikipedia page is presented in
>>> the search results, changes in what information is about a
>>> Wikipedia-referenced entity is represented in a Google Knowledge Panel or
>>> Bing Snapshot)?
At a pretty high level, here are two questions the data analyst team will
try to answer:

   - Was there a difference in total pageviews between the control and test
   - Was there a difference between the control and test groups in traffic
   from search engines?

It’s conceivable that ranking/placement data that’s available may be worth
exploring, as it could be a signal with predictive power for any traffic
fluctuation. Alignment of the data with signals in the discovery platform
metrics interfaces about structured markup might also be helpful.

Observation of the specific treatment of Wikipedia articles in different
discovery platforms isn’t something that’s been instrumented in to my
knowledge (however, there is some research on user engagement with certain
treatments, with some matching anecdotal evidence). The discovery platforms
are of course themselves rather dynamic, so I think the nature of analyzing
specific treatments is more likely to be ad hoc.

>>> And a similar question in regard to your interest in "how markup like
>>> this ... might be beneficial to the semantic web and machine
>>> readability...."  How are you assessing this impact (asking because, as
>>> with "search relevance", any A/B test is predicated on begin able to
>>> measure the performance of A compared to B, which in turn requires that a
>>> measurement protocol be in place).

This is likely to be more exploratory in nature.

We are definitely interested to hear of ways that this readier mapping of
articles to Wikidata entities in Wikipedia HTML might help in different
applications - we welcome feedback from the list members!

Of course, different discovery platforms provide insights about how they
treat different types of structured markup, but we’ll be interested to see
to what degree the well known treatments materialize during the test while
entity relationships are made more concrete in the returned markup for the
test group.

>>> Thanks in advance for any further detail  you can provide.
Sure thing! Thanks again for asking!

>>> Aaron Bradley
>>> Knowledge Graph Strategist
>>> Electronic Arts
>>> On Tue, Nov 20, 2018 at 5:30 AM Adam Baso <abaso@wikimedia.org> wrote:
>>>> Hello -
>>>> I'm Adam Baso, an engineering director at the Wikimedia Foundation. We
>>>> wanted to let the Schema.org community know that we're A/B testing the
>>>> sameAs property on Wikipedia articles having a corresponding Wikidata
>>>> entry. We're running the test on most [1] Wikipedias, including English
>>>> Wikipedia. You can see this in effect for the markup on the English
>>>> article for War of the Polish Succession
>>>> <https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FWar_of_the_Polish_Succession>
>>>> .
>>>> We have applied the new property to 50% of Wikipedia pages with
>>>> corresponding Wikidata entries by way of the Article type. Our CDN is being
>>>> populated with this change so you should see more of this markup. We'll be
>>>> running this A/B test for about two months and will be evaluating results
>>>> along the way.
>>>> One motivation for this work is to explore the effect of the changes on
>>>> search relevance. More generally, we're also interested in how structured
>>>> markup like this on Wikimedia content projects might be beneficial to the
>>>> semantic web and machine readability, much along the lines of other
>>>> Wikimedia movement initiatives such as Structured Data on Commons
>>>> <https://commons.wikimedia.org/wiki/Commons:Structured_data>.
>>>> Any feedback you have on our implementation in this A/B test would be
>>>> most welcome. Please share your feedback with Olga Vasileva (
>>>> ovasileva@wikimedia.org). Thanks!
>>>> Regards,
>>>> Adam Baso
>>>> Engineering Director
>>>> Wikimedia Foundation
>>>> [1]
>>>> The following wikis are excluded from A/B testing due to overlap with
>>>> other A/B tests:
>>>> Indonesian: idwiki <https://id.wikipedia.org/wiki/Halaman_Utama>
>>>> Portuguese: ptwiki
>>>> <https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal>
>>>> Punjabi: pawiki
>>>> <https://pa.wikipedia.org/wiki/%E0%A8%AE%E0%A9%81%E0%A9%B1%E0%A8%96_%E0%A8%B8%E0%A8%AB%E0%A8%BC%E0%A8%BE>
>>>> Dutch: nlwiki <https://nl.wikipedia.org/wiki/Hoofdpagina>
>>>> Korean: kowiki
>>>> <https://ko.wikipedia.org/wiki/%EC%9C%84%ED%82%A4%EB%B0%B1%EA%B3%BC:%EB%8C%80%EB%AC%B8>
>>>> Bhojpuri: bhwiki <https://bh.wikipedia.org>
>>>> Cherokee: chrwiki
>>>> <https://chr.wikipedia.org/wiki/%E1%8E%A4%E1%8E%B5%E1%8E%AE%E1%8E%B5%E1%8F%8D%E1%8F%97>
>>>> Kazakh: kkwiki
>>>> <https://kk.wikipedia.org/wiki/%D0%91%D0%B0%D1%81%D1%82%D1%8B_%D0%B1%D0%B5%D1%82>
>>>> Catalan: cawiki <https://ca.wikipedia.org/wiki/Portada>
>>>> French: frwiki
>>>> <https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Accueil_principal>
>>>> Yoruba: yowiki
>>>> <https://yo.wikipedia.org/wiki/Oj%C3%BAew%C3%A9_%C3%80k%E1%BB%8D%CC%81k%E1%BB%8D%CC%81>
>>>> Kalmyk: xalwiki
>>>> <https://xal.wikipedia.org/wiki/%D0%9D%D2%AF%D1%80_%D1%85%D0%B0%D0%BB%D1%85>
Received on Tuesday, 27 November 2018 18:21:37 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 27 November 2018 18:21:38 UTC