Putting Word-breaking in CLReq?

In case I forget about this in the next meeting, here’s a request about word-breaking and the relevant discussion. Word breaking is important for the Selection and Editing APIs. Shall we provide some brief answers to this topic in the CLReq?

Q: Does anyone know of character level mechanisms used to advise alogrithms of the word boundaries (or lack of boundaries) in Chinese text?
https://lists.w3.org/Archives/Public/public-html-ig-zh/2015Jul/0004.html <https://lists.w3.org/Archives/Public/public-html-ig-zh/2015Jul/0004.html>

From: Li Songfeng
中文正文断词除了标点不能位于行首以及单字不成行(一个字不能占一行)、孤行控制(分页情况下,一段第一行出现在页尾或最后一行出现在页首 )外,就想不起来其他规则了。中西文、数字混排会更复杂。中文标题如果太长需要折行,的确有构词的问题,比如“……的……”中的“的”不能出现在下一行行首。

From: Zhang Kun
这个应该跟排版没什么关系,是中文输入法特有的问题,西文是由空格断词,而中文没有,这就可能出现一些问题,例如,武汉市长江大桥,可以有两种断词方式:武汉市-长江大桥,武汉市长-江大桥,这个人问的问题,在现有技术下没有特别好的机制。

--
xiaoqian

Received on Friday, 24 July 2015 10:01:50 UTC