Re: [csswg-drafts] URL encoding of CSS values (#9301) from Simon Pieters via GitHub on 2023-10-25 (public-css-archive@w3.org from October 2023)

From: Simon Pieters via GitHub <sysbot+gh@w3.org>
Date: Wed, 25 Oct 2023 23:47:24 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-1780207124-1698277642-sysbot+gh@w3.org>

I've now queried httparchive. Getting the actual encoding is quite tricky, and so I gave up on that. I assume a big chunk is utf-8, but certainly not everything.

The dataset is from 2022-07-01 (same as [2022 Web Almanac](https://almanac.httparchive.org/en/2022/)). Total number of pages is 7,303,959.

Number of pages with non-ASCII in the query string in `url()` in CSS: 2231. So at most **0.03% of pages** in the dataset (likely less since this includes utf-8 pages).

An example match is `url('https://fonts.googleapis.com/css?family=Noto+Sans+JP:900&text=西部警察2020年12月29日 発売決定！「大都会」シリーズ')` (but the page for this uses utf-8).

(I excluded the cases where the first non-ASCII character in the query string is "â" because there were some that used non-ASCII quote marks and with encoding mismatch it became e.g. `url(â€˜./fonts/Avenir-Next.eot?#iefixâ€™)` – which is unlikely to work to begin with, and therefore not interesting to include.)

Full results: https://docs.google.com/spreadsheets/d/1i9Gvs1JIDo5mOw-rwPc5ppI6KrbXC8ol2XJ7h9PGAVs/edit?usp=sharing

-- 
GitHub Notification of comment by zcorpan
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/9301#issuecomment-1780207124 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 25 October 2023 23:47:26 UTC