W3C home > Mailing lists > Public > public-css-archive@w3.org > June 2020

Re: [csswg-drafts] [css-text] What are the language-defined segment breaks for HTML? (#5147)

From: Simon Sapin via GitHub <sysbot+gh@w3.org>
Date: Tue, 02 Jun 2020 21:47:32 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-637824207-1591134451-sysbot+gh@w3.org>
https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream says that the input of the HTML parser must go through https://infra.spec.whatwg.org/#normalize-newlines which replaces CRLF and lone CR with LF. But the way this is written is all about moving code points around, it doesn’t attribute them much meaning.

https://html.spec.whatwg.org/#newlines defines a "newline" term
> Newlines in HTML may be represented either as U+000D CARRIAGE RETURN (CR) characters, U+000A LINE FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR), U+000A LINE FEED (LF) characters in that order.

Presumably "in HTML" there means before preprocessing.

> I think it ends up defaulting

Having normative spec based on *defaulting*, on the absence of a definition that says to do otherwise, is what I think is not great. As an implementer I don’t feel confident that there is indeed no such definition for HTML, rather than I failed to find it.

GitHub Notification of comment by SimonSapin
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/5147#issuecomment-637824207 using your GitHub account
Received on Tuesday, 2 June 2020 21:47:33 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 06:42:09 UTC