W3C home > Mailing lists > Public > public-schemaorg@w3.org > November 2017

Re: Index of Types -> Domains

From: Aaron Bradley <aaranged@gmail.com>
Date: Fri, 3 Nov 2017 12:13:35 -0700
Message-ID: <CAMbipBtz3B8bFJA7RFZYXyZd81S9uwtugSD7knScfi9p7zVp7g@mail.gmail.com>
To: David Pierce <david.dean.pierce@gmail.com>
Cc: "schema.org Mailing List" <public-schemaorg@w3.org>
Speaking not to where the domain usage counts on schema.org come from (Dan
Brickley might be able to address that), but this specifically:

> "... examples of domains implementing a particular Type or format in
their pages"

While the last structured data extraction was generated from a crawl now
more than a year old, the Web Data Commons (http://www.webdatacommons.org/)
does make statistics available on "RDFa, Microdata, Embedded JSON-LD, and
Microformats" found in each crawl, as well as the full corpus.  You can
find the most recent data here:

Web Data Commons Extraction Report - October 2016 Corpus

If you have the desire to do so, you can also access and analyze Common
Crawl data yourself (data sets are generated monthly); learn more here:

On Fri, Nov 3, 2017 at 11:14 AM, David Pierce <david.dean.pierce@gmail.com>

> I've seen this come up a bit in the SEO community where webmasters and
> researchers are trying to look for examples of domains implementing a
> particular Type or format in their pages.
> Moreover, when I look at the documentation for any given type--Product
> <http://schema.org/Product>, for example--I see a note about its usage: *Usage:
> Over 1,000,000 domains*
> How is this calculated, where is stored, and how might I make use of it?
> Has an index of Types to Domains already been built?
> If not, has anybody explored building one? At the outset, I imaging
> something complementing the aforementioned Type documentation. For each
> type, I want to see what sites include that in their schema.org markup,
> and what implementation they use (microdata vs. json+ld).
> If this has already been built, I'd love to make use of it for my own
> learning. If it hasn't...well, I suppose I might start working on building
> it.
Received on Friday, 3 November 2017 19:13:59 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:12:37 UTC