Re: [whatwg/dom] Valid/Invalid characters in document.createElement() (#849) from Domenic Denicola on 2022-01-07 (public-webapps-github@w3.org from January 2022)

From: Domenic Denicola <notifications@github.com>
Date: Fri, 07 Jan 2022 08:22:14 -0800
To: whatwg/dom <dom@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/dom/issues/849/1007541209@github.com>

Here is an initial stab at maximally-lenient rules that I think work. Someone double-checking would be great; if I can get confirmation that they seem right, I can probably spend some time on a spec PR.

- For element local names:
  - LenientElementNameStartChar := same as existing [NameStartChar](https://www.w3.org/TR/xml/#NT-NameStartChar). (Parser [only switches to tag name stage if given ASCII alpha as first character](https://html.spec.whatwg.org/#tag-open-state), so NameStartChar is more lenient than the parser.)
  - LenientElementNameChar := anything exept tab, LF, FF, space, /, >, NULL. (This appears to be [what the parser accepts in the tag name state](https://html.spec.whatwg.org/#tag-name-state). NameChar also disallows all of these. The parser will lowercase ASCII upper alphas but we cannot do this in DOM APIs.)
- For element qualified names:
  - Get rid of existing validate step that uses QName.
  - Strictly split on :
  - Validate resulting localName per above rules
  - Validate resulting prefix via [Prefix](https://www.w3.org/TR/xml-names/#NT-Prefix), i.e. existing rules. (The parser does not ever create elements with prefixes so no need to make this more lenient.)
- For attribute local names:
  - LenientAttributeNameStartChar := anything except tab, LF, FF, space, /, >, NULL. ([Relevant parser spec](https://html.spec.whatwg.org/#before-attribute-name-state). NameStartChar also disallows all of these. The parser will lowercase ASCII upper alphas but we cannot do this in DOM APIs.)
  - LenientAttributeNameChar := LenientAttributeNameStartChar but also exclude =
- For attribute qualified names:
  - Similar formula as for element qualified names: strictly split on :, validate resulting localName per above rules, validating resulting prefix per existing `Prefix` production.
  - The parser only creates attributes with a [small set of lowercase-ASCII prefixes](https://html.spec.whatwg.org/#adjust-foreign-attributes) so no need to make Prefix more lenient here either.

Probably we should not touch custom element name rules. We could in theory make [PCENChar](https://html.spec.whatwg.org/multipage/custom-elements.html#prod-pcenchar) similarly lenient to LenientNameChar, but I'm not sure that leniency actually is a good idea for them, since `customElements.define()` basically gives us a single location at which to enforce good naming practices and, if you pass them, grant you custom element powers. It's not like the situation with parser-created vs. API-created.

Although I've phrased the above in terms of hypothetical grammar productions (e.g. LenientElementNameStartChar) the actual spec would probably be better as algorithms that loop over code units/code points, since that is how they're implemented. And per the OP of this thread the current implementations have bugs, which I suspect might be due to the attempt at translating from grammar specifications into algorithms.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/dom/issues/849#issuecomment-1007541209
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/dom/issues/849/1007541209@github.com>

Received on Friday, 7 January 2022 16:22:27 UTC