- From: Liam R E Quin <liam@w3.org>
- Date: Fri, 23 May 2014 19:43:31 -0400
- To: Koji Ishii <kojiishi@gluesoft.co.jp>
- Cc: "www-style@w3.org" <www-style@w3.org>
[tl;dr - angst about hyphens that can be ignored] On Tue, 2014-05-20 at 04:42 +0000, Koji Ishii wrote: > If you still have use cases to specify this word is “food thief” and > that word is “carpet thief”, it looks to me that it’s a semantic issue > since you don’t want to change the meaning of words when styles were > changed. This happens in quite a few languages, and today's word processing and typesetting systems (including TeX) deal with it using a dictionary. It's not an edge case, it's part of doing hyphenation. A good approach might be to say a user agent should not normally attempt hyphenation of text in languages which that user agent does not support - that way, e.g. a user who reads Swedish will probably have the Swedish locale and dictionary installed, and will see hyphenated text, even if they are not in Sweden, and even if their primary language is (say) Japanese, but if they do not have the Swedish locale and dictionary installed, the text will still make sense to them, and another user who sees the text and perhaps copies and pastes it, won't get hyphenations that could change the meaning. The spec also needs to be clearer about how ­ interacts with the user agent -- e.g copy/paste, search, and what to do if the character is supplied as part of the value of a "content" property. If hyphenation is under CSS control, how to you allow a word break with no added hyphen after a / in one stylesheet and not in another? If a long word contains a soft hyphen can the formatter break the word elsewhere? What if it contains a "-"? If the user agent hyphenates automatically, do the inserted hyphens appear in the DOM or not? (this varies between browsers today I'm told). And then in-page search is potentially affected. People are creating Web pages with hyphenation in various incompatible ways today. So, if hyphen is included, we should be clearer about what it means. The Unicode Line Breaking Algorithm has some examples but they are not sufficient for implementations: we need normative text, not just examples. The Unicode LBA doc has an example of SHY followed by NBHY and examples suggesting what happens for Polish, but not for English. Maybe it's OK if we work quickly on level 4 text and specifying it more, and if there are good tests. But I fear that the text as written is difficult to write tests for, because it's not sufficiently precise and firm. But maybe I should just welcome css text 3 hyphenation as a step forward towards documenting what Web browsers do today, and hope for better in the future. Antenna House Formatter (for example) supports at least some of (maybe all of) the XSL-FO 2 hyphenation properties, and any rewrite of a Web browser formatter's lie breaking would need to take hyphenation requirements into account. So I think I've talked myself into accepting css text 3 hyphen, although I think it's crap :-) and trying to improve it for css text 4 (the draft is still pretty minimal). -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
Received on Friday, 23 May 2014 23:43:35 UTC