Re: [csswg-drafts] [css-text-3] Remove collapsible line breaks adjacent to word separators (#3481) from CSS Meeting Bot via GitHub on 2019-09-17 (public-css-archive@w3.org from September 2019)

From: CSS Meeting Bot via GitHub <sysbot+gh@w3.org>
Date: Tue, 17 Sep 2019 05:04:55 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-532060014-1568696694-sysbot+gh@w3.org>
The CSS Working Group just discussed `Collapsible breaks adjacent to word separtors`.

<details><summary>The full IRC log of that discussion</summary>
&lt;fantasai> Topic: Collapsible breaks adjacent to word separtors<br>
&lt;heycam> github: https://github.com/w3c/csswg-drafts/issues/3481<br>
&lt;heycam> fantasai: we generally have this concept in CSS and HTML that you can use white space to format your source, and we collapse white space down to a single space<br>
&lt;heycam> ... including line breaks<br>
&lt;heycam> ... for Chinese and Japanese which don't use spaces, we have some rules to remove the space otherwise you will be forced to put all paras on one line<br>
&lt;heycam> ... there are some rules for doing that based on character classes<br>
&lt;heycam> ... what we didn't consider thoroughly is languages that use a word separator that's not a space<br>
&lt;heycam> ... we do special case ZWSP, for Thai and other languages<br>
&lt;heycam> ... but we don't have something similar for Ethiopic word space<br>
&lt;heycam> ... probably don't also want a regular space there<br>
&lt;heycam> ... proposal is when there's a word separator character adjacent to a line break, the line break just goes away<br>
&lt;heycam> ... I think the characters that are affected here are Ogham space mark and Ethiopic word space and the Tibetan tsek<br>
&lt;heycam> AmeliaBR: does this map to something in Unicode? or do we need to maintain this list?<br>
&lt;koji> https://drafts.csswg.org/css-text-3/#word-separator<br>
&lt;heycam> r12a: I think there is something, not sure if it's fit for this purpose<br>
&lt;heycam> r12a: archaic scripts have other examples<br>
&lt;heycam> y<br>
&lt;heycam> fantasai: [reads definition in the spec right now for word-spacing]<br>
&lt;heycam> florian: we need to maintain a list<br>
&lt;heycam> myles: let's ask Unicode to do it<br>
&lt;heycam> ... if there is such a facility for these character lists, hard to believe it's specific for the web platform<br>
&lt;heycam> ... and not needed in text editors for example<br>
&lt;heycam> ... I don't think the web specs should maintain this list<br>
&lt;heycam> florian: I agree with part of your statement, should try to work this out with Unicode<br>
&lt;heycam> ... this one specifically maybe, but some are specifically web platform relatively<br>
&lt;heycam> ... since this is relevant to turning HTML markup into text<br>
&lt;heycam> myles: there are many different markup languages...<br>
&lt;heycam> fantasai: there are 2 questions<br>
&lt;heycam> ... if we want to do this, and then whether we maintain the list of if Unicode should<br>
&lt;heycam> addison: i think we want to do some research<br>
&lt;heycam> ... space or no space is a classic problem<br>
&lt;heycam> ... I would be surprised if there weren't something, but don't know off the top of my head<br>
&lt;heycam> ... would be happy to engage<br>
&lt;heycam> myles: if this is a classical problem, it's been solved, and we should figure out how it's been solved in the past and re-use that solution<br>
&lt;heycam> fantasai: looking at some of the stuff in css-text, weh ave a concept of word separateors<br>
&lt;heycam> ... and it includes a set of code points<br>
&lt;heycam> ... it excludes Ogham space mark<br>
&lt;heycam> ... since it would cause text to not join any more<br>
&lt;heycam> ... so general usage in UNicode is text processing segmentation is not going to account ofr that concern, since they don't deal with typesetting<br>
&lt;heycam> ... so there's gonna be some aspects of how we're using Unicode codepoints with sepecific requirements that haven't come up in Unicode's context so far<br>
&lt;heycam> ... unbreaking lines is something that's been hard to explain to them<br>
&lt;heycam> myles: maybe we shouldn't be unbreaking them?<br>
&lt;heycam> fantasai: too late for that!<br>
&lt;heycam> addison: fwiw I've had to write this code in the past, and it's not any fun<br>
&lt;heycam> ... it maye have been individually solved but not written down<br>
&lt;fantasai> fantasai: HTML has been unbreaking lines for as long as it has existed, we want to make that ability available to more languages<br>
&lt;heycam> r12a: like with the other issues, we need to look in more detail<br>
&lt;heycam> ... the Tsek is a syllable separator, not the same as a word joiner<br>
&lt;heycam> ... you could end a line with a Tsek, then start with more Tibetan on the next line, with indentation, and no real reason to join those together necessarily<br>
&lt;heycam> fantasai: you wouldn't make the Tsek go away, just avoid the extra space going in there<br>
&lt;heycam> ACTION: i18n to look this issue of word separators next to newlines<br>
&lt;trackbot> Error finding 'i18n'. You can review and register nicknames at &lt;https://www.w3.org/Style/CSS/Tracker/users>.<br>
&lt;addison> action: addison: ensure we respond to css 3481<br>
&lt;trackbot> Error finding 'addison'. You can review and register nicknames at &lt;https://www.w3.org/Style/CSS/Tracker/users>.<br>
</details>


-- 
GitHub Notification of comment by css-meeting-bot
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/3481#issuecomment-532060014 using your GitHub account
Received on Tuesday, 17 September 2019 05:04:56 UTC