W3C home > Mailing lists > Public > public-clreq-admin@w3.org > July to September 2015

Putting Word-breaking in CLReq?

From: Xiaoqian Wu <xiaoqian@w3.org>
Date: Fri, 24 Jul 2015 18:01:42 +0800
Message-Id: <15DCE8D4-82FB-4144-B3C8-580FE7CB4A93@w3.org>
To: public-clreq-admin@w3.org
In case I forget about this in the next meeting, here’s a request about word-breaking and the relevant discussion. Word breaking is important for the Selection and Editing APIs. Shall we provide some brief answers to this topic in the CLReq?

Q: Does anyone know of character level mechanisms used to advise alogrithms of the word boundaries (or lack of boundaries) in Chinese text?
https://lists.w3.org/Archives/Public/public-html-ig-zh/2015Jul/0004.html <https://lists.w3.org/Archives/Public/public-html-ig-zh/2015Jul/0004.html>

From: Li Songfeng
中文正文断词除了标点不能位于行首以及单字不成行(一个字不能占一行)、孤行控制(分页情况下,一段第一行出现在页尾或最后一行出现在页首 )外,就想不起来其他规则了。中西文、数字混排会更复杂。中文标题如果太长需要折行,的确有构词的问题,比如“……的……”中的“的”不能出现在下一行行首。

From: Zhang Kun
这个应该跟排版没什么关系,是中文输入法特有的问题,西文是由空格断词,而中文没有,这就可能出现一些问题,例如,武汉市长江大桥,可以有两种断词方式:武汉市-长江大桥,武汉市长-江大桥,这个人问的问题,在现有技术下没有特别好的机制。

--
xiaoqian

Received on Friday, 24 July 2015 10:01:50 UTC

This archive was generated by hypermail 2.3.1 : Friday, 24 July 2015 10:01:50 UTC