W3C home > Mailing lists > Public > public-css-archive@w3.org > December 2019

Re: [csswg-drafts] [css-syntax-3] Input stream processing can calculate wrong encoding (#4126)

From: Mark Rogers via GitHub <sysbot+gh@w3.org>
Date: Fri, 13 Dec 2019 17:24:28 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-565529695-1576257867-sysbot+gh@w3.org>
> It's served as Content-Type: text/css; charset=shift_jis. It also starts with a Shift_JIS byte sequence that happens to match the UTF-8 BOM (great test case)
`ef bb bf 2e e5 b9 b3 e5 92 8c 0d 0a 7b 0d 0a 20 |............{.. |`

>> Circling back around: do you have any evidence of pages breaking due to this behavior?

Not other than the test case ... but I don't see enough CSS files using scripts where this might be a problem to provide enough data either way.

There are likely to be encodings other than Shift-JIS where `ef bb bf 2e` can appear as valid characters at the start of a file.

I guess my concerns are two-fold:

1) there's a coupling between BOM sniffing and the syntax of the document being sniffed - it's more reliable with some document types because their syntax makes it unlikely/impossible to have  non-ASCII characters at offset zero

2) this coupling means sniffing can become less reliable due to syntax changes unrelated to CSS. For example, HTML introducing custom element names means `ef bb bf 2e` is more likely to appear at offset zero in CSS as the name of an element style rule.

GitHub Notification of comment by dd8
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/4126#issuecomment-565529695 using your GitHub account
Received on Friday, 13 December 2019 17:24:29 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 06:41:57 UTC