Re: HTTP Archive: Breakdown of font usage by type (WOFF 2.0/1.0, TTF, OTF)

On Tue, May 16, 2017 at 7:31 PM, John Hudson <john@tiro.ca> wrote:

Hello John,

On 11/05/17 10:21, David Kuettel wrote:
>
>> Here is a breakdown of the font usage by type across the HTTP Archive
>> corpus.
>>
>
> This is really interesting, David. Thank you.
>

Thank you John!

>
> I wonder if you are able to tell, for each format, roughly what percentage
> of the fonts being served are coming from a service — Google Fonts,
> TypeKit, etc. — and what percentage are being self-hosted at the sites? I
> think that would be a very interesting insight into the webfont ecosystem,
> especially with regard to the impressive WOFF2 take-up.


While we have not done the full analysis of the breakdown of font formats
served by service and beyond, much of the data is publicly available.

In regards to font services (Typekit, Monotype, Google Fonts, etc), my
understanding is that all strive to serve the fonts in the most optimal
(smallest file size, lowest latency) manner for a given browser/platform.

For example, Typekit, for which Bram Stein wrote an awesome WOFF 2.0 launch
blog post:

https://blog.typekit.com/2015/08/26/woff2-support-added-to-typekit/

Additionally, for Google Fonts, which has fully (100%) enabled WOFF 2.0
serving to supporting browsers / user-agents (http://caniuse.com/#feat=woff2).
If we missed any, please just let us know.

Rod wrote an excellent blog post on the WOFF 2.0 rollout a while back:

https://developers.googleblog.com/2015/02/smaller-fonts-with-woff-20-and-unicode.html

Where adoption has likely been slower (we have yet to analyze and quantify)
is with self-hosted integrations.  Intuitively this would make a lot of
sense, as web font technology has rapidly changed over the past few years
(with WOFF 2.0, WOFF 1.0, unicode-range and much more), and keeping up with
all of the latest and ever changing advancements would be an undertaking.

One could perform an analysis of the Alexa Top 500K home pages through the
publicly accessible HTTP Archive database, and the results would be
fascinating for all of us.

http://httparchive.org/
https://www.igvita.com/2013/06/20/http-archive-bigquery-web-performance-answers/

FWIW, the "Web Font Media Type (mime type) Analysis" from 2015 was based on
the same data source: http://goo.gl/zbDhUN  Speaking of which, I should do
an update.

To help kick off the analysis, here are a few queries that one could build
on.  For the Alexa Top 500K home pages:

# Number of loaded WOFF 2.0 (.woff2) font files

$ bq query 'select count(*) from [httparchive:runs.latest_requests] where
ext = "woff2"';
+--------+
|  f0_   |
+--------+
| 919165 |
+--------+

# Top hosts from which the WOFF 2.0 (.woff2) font files were served from.
Self-hosted integrations are likely the long tail.

$ bq query 'select count(*), top(regexp_extract(url, "http[s]?://([^/]*)"))
host from [httparchive:runs.latest_requests] where ext = "woff2"';
+--------+------------------------------------+
|  f0_   |                host                |
+--------+------------------------------------+
| 726029 | fonts.gstatic.com                  |
|  21087 | maxcdn.bootstrapcdn.com            |
|   5132 | fast.fonts.net                     |
|   2701 | cdnjs.cloudflare.com               |
|   2190 | use.fontawesome.com                |
|   2038 | netdna.bootstrapcdn.com            |
|    842 | cdn.shopify.com                    |
|    498 | cdn.jsdelivr.net                   |
|    474 | themes.googleusercontent.com       |
|    459 | c.disquscdn.com                    |
|    343 | assets.tumblr.com                  |
|    294 | fast.fonts.com                     |
|    288 | sdk.azureedge.net                  |
|    287 | s3.amazonaws.com                   |
|    246 | cdn.revjet.com                     |
|    229 | snapwidget.com                     |
|    210 | static.parastorage.com             |
|    204 | www.crsc.philips.com               |
|    190 | a0.muscache.com                    |
|    189 | sp-bootstrap.global.ssl.fastly.net |
+--------+------------------------------------+

# Number of loaded WOFF 1.0 (.woff) font files

$ bq query 'select count(*) from [httparchive:runs.latest_requests] where
ext = "woff"';
+--------+
|  f0_   |
+--------+
| 304407 |
+--------+

# Top hosts from which the WOFF 1.0 (.woff) font files were served from.
Self-hosted integrations are likely the long tail.

$ bq query 'select count(*), top(regexp_extract(url, "http[s]?://([^/]*)"))
host from [httparchive:runs.latest_requests] where ext = "woff"';
+------+-------------------------------+
| f0_  |             host              |
+------+-------------------------------+
| 7756 | themes.googleusercontent.com  |
| 4596 | netdna.bootstrapcdn.com       |
| 3315 | maxcdn.bootstrapcdn.com       |
| 2910 | js.intercomcdn.com            |
| 2456 | cdn.shopify.com               |
| 2360 | fonts.gstatic.com             |
| 1358 | widgets.livetex.ru            |
|  851 | widgets.trustedshops.com      |
|  848 | ssl.p.jwpcdn.com              |
|  765 | static.parastorage.com        |
|  700 | fast.fonts.net                |
|  672 | cdnjs.cloudflare.com          |
|  655 | dsms0mj1bbhn4.cloudfront.net  |
|  626 | fast.wistia.com               |
|  611 | ssl-ccstatic.highwebmedia.com |
|  593 | assets.tumblr.com             |
|  588 | img1.wsimg.com                |
|  546 | fast.fonts.com                |
|  529 | w.uptolike.com                |
|  507 | www.gannett-cdn.com           |
+------+-------------------------------+


In regards to insights on the over all web font ecosystem, there are
additional data sources that are worth calling out.

ATypI 2016: Industry Update - Adoption, Opportunities and Adventures Ahead
http://goo.gl/Eda0Mw (esp. slides 3-12)

And the backing web font analysis data which is publicly available:

httparchive.org (Alexa Top 100, 1K)
http://httparchive.org/trends.php?s=Top1000

Large-scale web font usage analysis
Getting started guide: goo.gl/5HeqYf

I hope this is all helpful and of interest.

Thank you,
David

>
>
> J.
>
>
> --
>
> John Hudson
> Tiro Typeworks Ltd    www.tiro.com
> Salish Sea, BC        tiro@tiro.com
>
> NOTE: In the interests of productivity, I am currently
> dealing with email on only two days per week, usually
> Monday and Thursday unless this schedule is disrupted
> by travel. If you need to contact me urgently, please
> use some other method of communication. Thank you.
>
>
>

Received on Thursday, 18 May 2017 18:41:01 UTC