Re: Common Crawl for testing effect of COEP/etc. rollouts?

Hi Sam,

Thanks for sharing the link to the slides and the dataset.

I think for similar web-level analysis several folks have used HTTP Archive
(https://httparchive.org/; see also https://httparchive.org/reports) in
combination with browser telemetry (e.g. Chrome's UseCounters:
https://chromium.googlesource.com/chromium/src/+/HEAD/docs/use_counter_wiki.md).
But having another data source can definitely be useful and at least I
wasn't aware of Common Crawl before.

Cheers,
-Artur

On Thu, Dec 2, 2021 at 9:35 PM Samuel Weiler <weiler@w3.org> wrote:

> James Richards at Nominet just did a DNS-related study using computed
> metadata (WAT) from the Common Crawl datasets.
>
> The WAT format contains both HTTP headers as well as a catalog of all
> links on the page.  I wonder if this dataset might be useful for
> estimating the effects of COEP and similar changes, perhaps in lieu of
> or in advance of origin trials and similar live mechanisms.
>
> Info on the data format:
> https://commoncrawl.org/the-data/get-started/#WAT-Format
>
> The presentation that led me here:
>
> https://indico.dns-oarc.net/event/40/contributions/886/attachments/842/1558/oarc-cc-presentation-james-richards.pdf
>
> -- Sam
>
>

Received on Friday, 3 December 2021 17:03:36 UTC