Re: [whatwg/dom] Define processing instruction attributes (PR #1454)

noamr left a comment (whatwg/dom#1454)

> So far, when we've made DOM APIs change behavior based on the HTMLness flag on the `Document`, it has caused issues.
> 
> I'm well aware that concerns about XML are unfashionable, but I think it's bad to risk creating a browser-XML dialect that would be incompatible with other XML just to be consistent with the rest of HTML parsing on the HTML side.
> 
> That is, I'm uneasy about broadering the pseudo-attribute syntax from https://www.w3.org/TR/xml-stylesheet/ on the XML side and I'm uneasy about making the HTML and XML sides differ, which logically results in considering the use of https://www.w3.org/TR/xml-stylesheet/ syntax on the HTML side as well.
> 
> How bad would that be? WebKit and Blink seem to delegate to an XML tokenizer. Gecko has a special-purpose tokenizer.
> 
> Looking at [Google HTML/CSS Style Guide](https://google.github.io/styleguide/htmlcssguide.html) and use cases, it seems to me that using HTML attribute tokenization instead of https://www.w3.org/TR/xml-stylesheet/ isn't really something that's needed from use cases at hand and would add support for things that are frowned upon anyway these days when writing HTML, so this seems to be a matter of "for consistency" with the rest of HTML.

Do you mean things like `<?start name=foo>`, which would be HTML-compliant but not XML-compliant?
I think it's more than consistency... it's keeping the HTML fault tolerance.

> 
> If we believe that we need to re-use HTML attribute syntax "for consistency" with the full entity set and everything in HTML, I think it would be then less bad to make the behavior depend on HTMLness flag of the `Document` in order not to introduce a browser dialect of `xml-stylesheet` on the XML side than to introduce the use the HTML tokenizer to `xml-stylesheet` in XML, which would introduce a browser dialect of XML (for CSS even if XSLT goes away).

I'm fine with that and that was the original PR. @annevk opposed this though?

> As for @annevk 's remark about WebVTT: In Gecko, the WebVTT tokenizer is implemented in JS and calls into HTML parsing in an expensive way to support the full HTML entity set. Reusing the WebVTT tokenizer on the spec level would not result in any implementation ease in Gecko.
> 
> Supporting the full HTML entity set and the way semicolon omission works is both in spec and implementation the hardest part. WebVTT delegates to HTML. If we want to support the full HTML entity set, we should delegate to the HTML parser outright (like the PR currently does) instead of doing the WebVTT thing of supposedly defining a simplified thing but then jumping into the hard part of the more complex thing both spec and implementation-wise anyway.
> 
> I think it's sad to have that complexity "for consistency". Perhaps "for consistency" is the right thing, but I still find it sad to have the complexity, especially for something that's ["Do not use" in the Google style guide](https://google.github.io/styleguide/htmlcssguide.html#Entity_References) and spec-wise a backward-compat feature in the sense that we're not extending the entity set.

I think people would find it surprising that parsing pseudo-attributes would work different from parsing attributes.
I personally prefer to use the document-flag thing.
But all that being said, this whole thing is not something I feel strongly about. Developers would adapt to whatever we decide here.
If you and @annevk can reach a consensus on either of theh 3 options, I'm OK to align.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/dom/pull/1454#issuecomment-4451153802
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/dom/pull/1454/c4451153802@github.com>

Received on Thursday, 14 May 2026 13:37:19 UTC