Re: [csswg-drafts] [css-text] What are the language-defined segment breaks for HTML? (#5147)

https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream says that the input of the HTML parser must go through https://infra.spec.whatwg.org/#normalize-newlines which replaces CRLF and lone CR with LF. But the way this is written is all about moving code points around, it doesn’t attribute them much meaning.

https://html.spec.whatwg.org/#newlines defines a "newline" term
> Newlines in HTML may be represented either as U+000D CARRIAGE RETURN (CR) characters, U+000A LINE FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR), U+000A LINE FEED (LF) characters in that order.

Presumably "in HTML" there means before preprocessing.

> I think it ends up defaulting

Having normative spec based on *defaulting*, on the absence of a definition that says to do otherwise, is what I think is not great. As an implementer I don’t feel confident that there is indeed no such definition for HTML, rather than I failed to find it.

-- 
GitHub Notification of comment by SimonSapin
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/5147#issuecomment-637824207 using your GitHub account

Received on Tuesday, 2 June 2020 21:47:33 UTC