- From: Simon Pieters via GitHub <sysbot+gh@w3.org>
- Date: Wed, 25 Oct 2023 23:47:24 +0000
- To: public-css-archive@w3.org
I've now queried httparchive. Getting the actual encoding is quite tricky, and so I gave up on that. I assume a big chunk is utf-8, but certainly not everything. The dataset is from 2022-07-01 (same as [2022 Web Almanac](https://almanac.httparchive.org/en/2022/)). Total number of pages is 7,303,959. Number of pages with non-ASCII in the query string in `url()` in CSS: 2231. So at most **0.03% of pages** in the dataset (likely less since this includes utf-8 pages). An example match is `url('https://fonts.googleapis.com/css?family=Noto+Sans+JP:900&text=西部警察2020年12月29日 発売決定!「大都会」シリーズ')` (but the page for this uses utf-8). (I excluded the cases where the first non-ASCII character in the query string is "â" because there were some that used non-ASCII quote marks and with encoding mismatch it became e.g. `url(‘./fonts/Avenir-Next.eot?#iefix’)` – which is unlikely to work to begin with, and therefore not interesting to include.) Full results: https://docs.google.com/spreadsheets/d/1i9Gvs1JIDo5mOw-rwPc5ppI6KrbXC8ol2XJ7h9PGAVs/edit?usp=sharing -- GitHub Notification of comment by zcorpan Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/9301#issuecomment-1780207124 using your GitHub account -- Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Wednesday, 25 October 2023 23:47:26 UTC