Re: [whatwg/dom] Valid/Invalid characters in document.createElement() (#849)

I'd prefer if API + HTML parser + CSS naming rules would become more aligned, rather than less, so I think newly allowed character sets should be the same across HTML parser and `document.createElement`, and ideally CSS too. I'm less concerned about which character sets to pick, as long as we're consistent across the browser surfaces that deal with element + attribute names.

I'd quite strongly prefer that no existing HTML/XML meta characters would be newly allowed. E.g. several proposals above allow "<" as part of element names, or single quotes.

The Unicode replacement character (U+FFFD) should [probably be disallowed](https://hsivonen.fi/broken-utf-8/). This has caused browser bugs before. (Examples in the reference.)

Not sure if this already exists, but there should probably be some language about which unicode canonicalization (not) to do, and how equality of names is determined. Ideally, this would also be aligned between HTML, JS, and CSS, where I care less about the actual rules than about whether they're the same or not. (CVE-2000-0884 is a bug at the OS level, where one part of the system canonicalizes this way, another that way.) I believe ECMAScript has fairly specific rules about this already.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/dom/issues/849#issuecomment-1099117042
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/dom/issues/849/1099117042@github.com>

Received on Thursday, 14 April 2022 12:07:14 UTC