W3C home > Mailing lists > Public > www-style@w3.org > January 2013

A proposed standard for CSS-controlled sentence spacing

From: Thomas A. Fine <fine@head.cfa.harvard.edu>
Date: Wed, 09 Jan 2013 14:07:44 -0500
Message-ID: <50EDC000.10805@head.cfa.harvard.edu>
To: www-style mailing list <www-style@w3.org>
A few weeks ago I posted about the need for sentence spacing in
CSS. [1] I've thought about the options, and have tried various
approaches out.  I ended up adding some javascript on my Blogger
blog to post-process the output (I originally wanted to modify the
editor, but Blogger provides no hooks for editor plugins). [2]

The javascript relies on finding two spaces between sentences for
sentence detection.  This eliminates all sorts of ambiguous situations,
and it really clarified in my mind what the best approach would be
to provide CSS-controlled sentence spacing.  The javascript solution
worked out to be simpler to implement than an editor for marking
sentences (which would use two spaces as a signal anyway).  But
more importantly, having that javascript makes content creation
trivial: I just do what I would do normally, typing two spaces
between sentences, and voila, I get CSS-controlled sentence spacing.

So I'm proposing this as (part of) a solution for CSS to implement
sentence spacing.  This scoots around the objection that no one
would ever bother with all the work of creating sentence markup.
And while using spaces in this way would have been unthinkable a
few years ago, HTML has moved on, is not SGML compliant (not that
it ever really was).  There's no reason why CSS can't directly use
the two spaces following terminal punctuation as a way to reliably
detect sentences in a way which is fully controlled by the user,
and yet trivial at content creation time.

So here's what I think a complete, full-featured implementation would 
look like.  Two new parameters are desired:
   sentence-spacing:
     Specifies the amount of EXTRA space to add to the white space
     between sentences.  All the standard units should be available,
     with the em preferred.  (I recall a discussion about having a
     unit that is relative to a font's space character has been
     discussed.  I feel that this would also be useful.) This value
     may be negative.  Valid in any element.

   sentence-boundary:
     Picks an algorithm to use to select sentences.  Values are:
       none
	The sentence-spacing value is ignored, and sentences are
	not checked.  This would be the default setting, unless a
	sentence tag is added to HTML (see below).
       twospace
	This method relies on two spaces after terminal punctuation
	(period question-mark, exclamation point, etc.) to find
	sentence boundaries.  This setting would be ignored where
	white-space values of pre or pre-wrap are in effect.
       auto
	Use a language dependent sentence boundary algorithm to
	detect sentences.
       named
	Takes a class or list of classes as it's value.  When spans
	are found with any of the listed classes, the elements are
	sentences, and space between any of these elements is
	governed by sentence-spacing.
       tag
	Use the sentence tag in HTML, assuming it exists.  If it
	does exist, this should be the default value for
	sentence-boundary.

This is a soup-to-nuts approach.  The only choice I left out is
spanning the space between sentences, but I see no use in including
that as it wouldn't be providing an alternative that's of any use
to anyone as far as I can see.

The "twospace" and "auto" sentence-boundary settings are the most
immediately useful in the real world.  A significant percentage [2]
of content developers now use a two-space habit in their content
development, even though it is not viewable in the web.  It provides
easy, unambiguous sentence detection, and the authors most likely
to find it useful are already doing this.  The "auto" setting
provides the same feature to those who might desire wider sentence
spacing for older existing content, or content submitted from
elsewhere, but without the trouble of editing it, and who can live
with the occasional boundary detection error.

The "named" and "tag" alternatives are there to provide even more
explicit tag based control of sentences, including semantic sentence
markup.  It's in a way a more traditional approach, ignoring
whitespace as part of the content creation process, but in regular
use these would likely need help from an HTML editor to make sentence
detection fast and correct.  And honestly the "named" value is only
there as I see an uphill battle to getting a proper sentence tag
put in place.

      tom

----------
[1] http://lists.w3.org/Archives/Public/www-style/2012Dec/0304.html
[2] 
http://widespacer.blogspot.com/2012/12/dynamic-sentence-formatting-for-blogger.html
[3] http://lists.w3.org/Archives/Public/www-style/2013Jan/0062.html
Received on Wednesday, 9 January 2013 19:08:12 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:21:04 GMT