[csswg-drafts] [css-text-4] Add support for content-detection, phrase-based line breaking (#6730)

chrishtr has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-text-4] Add support for content-detection, phrase-based line breaking ==
**Proposal**

Add a CSS property that provides a way for developers to specify that they would like to use a phrase-based, content detection-based algorithm for line breaking. Implementations would use this CSS property to trigger use of a library that tries to determine phrase boundaries in text and break lines accordingly.

Example (hopefully I got this right, I don't speak Japanese):

A phrase often consists of multiple words. The following Japanese example consists of 6 words, but has 3 phrases.

私 | の | 名前 | は | 中野 | です。
-- | -- | -- | -- | -- | --
My | | name | is | Nakano | .
Noun | Particle | Noun | Particle | Noun | Auxiliary verb
Phrase 1 | | Phrase 2 | | Phrase 3


Phrase-based line breaking is often desired for headline-type text--text in a graphic display context, usually at large sizes, such as titles, headings, billboards, or advertisement graphics, especially in language such as CJK or Thai.

In some use cases such as accessibility content for children, phrase-based line breaking is also useful in a reading context at regular body text sizes.

**Design constraints I know of**:

* Fallback line breaking behavior for a UA should default to: word → phrase → none. If the developer specifies “word” as the preferred line breaking method, and the UA is unable to break on words, the UA may break on phrases if possible, and otherwise fall back to “character”. If the developer specifies “phrase”, the fallback should be directly to character and not “word”. 
  * Why: to most adult users, using word-based line breaking for headline-type text looks like a poor job, something failing, or unfinished work.

(Note: There may be a use case for a developer overriding this fallback path, e.g. by specifying “word phrase none” as the mode, meaning word line breaking is preferred, falling back to phrase, and then to character.)

* When the CSS property puts the UA into a mode that allows phrase-based line breaking, the UA may ignore [keep-all](https://drafts.csswg.org/css-text-3/#valdef-word-break-keep-all) and [break-all](https://drafts.csswg.org/css-text-3/#valdef-word-break-break-all).
  * Why: In order to avoid complexities in the phrase-based line breaking libraries.
  * Why: in order to support phrase-based line breaking for non-CJK languages such as Thai, where the current definition of keep-all doesn’t apply.

* The CSS property to enable phrase-based line breaking must not specify a specific language. (+)
  * Why: because it makes it hard for implementations to use a phrase-based line breaking library via ICU.


(+) Compare with the word-boundary-detection property, which currently requires a language when using the auto keyword. word-boundary-detection has this restriction because it is paired tightly with keep-all, whereas the phrase-based feature is not.

[Existing support](https://drafts.csswg.org/css-text-4/#word-boundary-detection) for word-based line breaking does not quite meet these requirements.


Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/6730 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Thursday, 14 October 2021 18:03:05 UTC