- From: Thomas A. Fine <fine@head.cfa.harvard.edu>
- Date: Wed, 09 Jan 2013 14:07:44 -0500
- To: www-style mailing list <www-style@w3.org>
A few weeks ago I posted about the need for sentence spacing in CSS. [1] I've thought about the options, and have tried various approaches out. I ended up adding some javascript on my Blogger blog to post-process the output (I originally wanted to modify the editor, but Blogger provides no hooks for editor plugins). [2] The javascript relies on finding two spaces between sentences for sentence detection. This eliminates all sorts of ambiguous situations, and it really clarified in my mind what the best approach would be to provide CSS-controlled sentence spacing. The javascript solution worked out to be simpler to implement than an editor for marking sentences (which would use two spaces as a signal anyway). But more importantly, having that javascript makes content creation trivial: I just do what I would do normally, typing two spaces between sentences, and voila, I get CSS-controlled sentence spacing. So I'm proposing this as (part of) a solution for CSS to implement sentence spacing. This scoots around the objection that no one would ever bother with all the work of creating sentence markup. And while using spaces in this way would have been unthinkable a few years ago, HTML has moved on, is not SGML compliant (not that it ever really was). There's no reason why CSS can't directly use the two spaces following terminal punctuation as a way to reliably detect sentences in a way which is fully controlled by the user, and yet trivial at content creation time. So here's what I think a complete, full-featured implementation would look like. Two new parameters are desired: sentence-spacing: Specifies the amount of EXTRA space to add to the white space between sentences. All the standard units should be available, with the em preferred. (I recall a discussion about having a unit that is relative to a font's space character has been discussed. I feel that this would also be useful.) This value may be negative. Valid in any element. sentence-boundary: Picks an algorithm to use to select sentences. Values are: none The sentence-spacing value is ignored, and sentences are not checked. This would be the default setting, unless a sentence tag is added to HTML (see below). twospace This method relies on two spaces after terminal punctuation (period question-mark, exclamation point, etc.) to find sentence boundaries. This setting would be ignored where white-space values of pre or pre-wrap are in effect. auto Use a language dependent sentence boundary algorithm to detect sentences. named Takes a class or list of classes as it's value. When spans are found with any of the listed classes, the elements are sentences, and space between any of these elements is governed by sentence-spacing. tag Use the sentence tag in HTML, assuming it exists. If it does exist, this should be the default value for sentence-boundary. This is a soup-to-nuts approach. The only choice I left out is spanning the space between sentences, but I see no use in including that as it wouldn't be providing an alternative that's of any use to anyone as far as I can see. The "twospace" and "auto" sentence-boundary settings are the most immediately useful in the real world. A significant percentage [2] of content developers now use a two-space habit in their content development, even though it is not viewable in the web. It provides easy, unambiguous sentence detection, and the authors most likely to find it useful are already doing this. The "auto" setting provides the same feature to those who might desire wider sentence spacing for older existing content, or content submitted from elsewhere, but without the trouble of editing it, and who can live with the occasional boundary detection error. The "named" and "tag" alternatives are there to provide even more explicit tag based control of sentences, including semantic sentence markup. It's in a way a more traditional approach, ignoring whitespace as part of the content creation process, but in regular use these would likely need help from an HTML editor to make sentence detection fast and correct. And honestly the "named" value is only there as I see an uphill battle to getting a proper sentence tag put in place. tom ---------- [1] http://lists.w3.org/Archives/Public/www-style/2012Dec/0304.html [2] http://widespacer.blogspot.com/2012/12/dynamic-sentence-formatting-for-blogger.html [3] http://lists.w3.org/Archives/Public/www-style/2013Jan/0062.html
Received on Wednesday, 9 January 2013 19:08:12 UTC