Re: A proposed standard for CSS-controlled sentence spacing

On Wed, Jan 9, 2013 at 11:07 AM, Thomas A. Fine
<fine@head.cfa.harvard.edu> wrote:
> A few weeks ago I posted about the need for sentence spacing in
> CSS. [1] I've thought about the options, and have tried various
> approaches out.  I ended up adding some javascript on my Blogger
> blog to post-process the output (I originally wanted to modify the
> editor, but Blogger provides no hooks for editor plugins). [2]
>
> The javascript relies on finding two spaces between sentences for
> sentence detection.  This eliminates all sorts of ambiguous situations,
> and it really clarified in my mind what the best approach would be
> to provide CSS-controlled sentence spacing.  The javascript solution
> worked out to be simpler to implement than an editor for marking
> sentences (which would use two spaces as a signal anyway).  But
> more importantly, having that javascript makes content creation
> trivial: I just do what I would do normally, typing two spaces
> between sentences, and voila, I get CSS-controlled sentence spacing.
>
> So I'm proposing this as (part of) a solution for CSS to implement
> sentence spacing.  This scoots around the objection that no one
> would ever bother with all the work of creating sentence markup.
> And while using spaces in this way would have been unthinkable a
> few years ago, HTML has moved on, is not SGML compliant (not that
> it ever really was).  There's no reason why CSS can't directly use
> the two spaces following terminal punctuation as a way to reliably
> detect sentences in a way which is fully controlled by the user,
> and yet trivial at content creation time.
>
> So here's what I think a complete, full-featured implementation would look
> like.  Two new parameters are desired:
>   sentence-spacing:
>     Specifies the amount of EXTRA space to add to the white space
>     between sentences.  All the standard units should be available,
>     with the em preferred.  (I recall a discussion about having a
>     unit that is relative to a font's space character has been
>     discussed.  I feel that this would also be useful.) This value
>     may be negative.  Valid in any element.
>
>   sentence-boundary:
>     Picks an algorithm to use to select sentences.  Values are:
>       none
>         The sentence-spacing value is ignored, and sentences are
>         not checked.  This would be the default setting, unless a
>         sentence tag is added to HTML (see below).
>       twospace
>         This method relies on two spaces after terminal punctuation
>         (period question-mark, exclamation point, etc.) to find
>         sentence boundaries.  This setting would be ignored where
>         white-space values of pre or pre-wrap are in effect.
>       auto
>         Use a language dependent sentence boundary algorithm to
>         detect sentences.
>       named
>         Takes a class or list of classes as it's value.  When spans
>         are found with any of the listed classes, the elements are
>         sentences, and space between any of these elements is
>         governed by sentence-spacing.
>       tag
>         Use the sentence tag in HTML, assuming it exists.  If it
>         does exist, this should be the default value for
>         sentence-boundary.
>
> This is a soup-to-nuts approach.  The only choice I left out is
> spanning the space between sentences, but I see no use in including
> that as it wouldn't be providing an alternative that's of any use
> to anyone as far as I can see.
>
> The "twospace" and "auto" sentence-boundary settings are the most
> immediately useful in the real world.  A significant percentage [2]
> of content developers now use a two-space habit in their content
> development, even though it is not viewable in the web.  It provides
> easy, unambiguous sentence detection, and the authors most likely
> to find it useful are already doing this.  The "auto" setting
> provides the same feature to those who might desire wider sentence
> spacing for older existing content, or content submitted from
> elsewhere, but without the trouble of editing it, and who can live
> with the occasional boundary detection error.
>
> The "named" and "tag" alternatives are there to provide even more
> explicit tag based control of sentences, including semantic sentence
> markup.  It's in a way a more traditional approach, ignoring
> whitespace as part of the content creation process, but in regular
> use these would likely need help from an HTML editor to make sentence
> detection fast and correct.  And honestly the "named" value is only
> there as I see an uphill battle to getting a proper sentence tag
> put in place.
>
>      tom
>
> ----------
> [1] http://lists.w3.org/Archives/Public/www-style/2012Dec/0304.html
> [2]
> http://widespacer.blogspot.com/2012/12/dynamic-sentence-formatting-for-blogger.html
> [3] http://lists.w3.org/Archives/Public/www-style/2013Jan/0062.html


I'm with Hixie for now, in the corresponding thread you've raised in
WHATWG about adding a <sentence> tag to HTML.  This doesn't seem to be
particularly useful, existing markup can handle it, editors can very
easily handle it, and there doesn't seem to be convincing evidence
that sentence spacing is actually much of a contributor to
readability.  While I happen to use two spaces after sentences, it's
mostly a finger tic from my days being taught keyboarding.
Two-spaces-after-a-sentence doesn't appear to be a reliable rule in
modern English typing, and I don't think it's much of one outside of
English either.

I'm also in general very wary of claiming that a simple heuristic can
reliably, across world languages, determine what a "sentence" is.  We
try to do as little as possible in heuristically determining
boundaries in CSS, because of the complexity of world languages; the
few times we have tried (like ::first-letter) still don't work
reliably across browsers.

~TJ

Received on Thursday, 10 January 2013 21:30:46 UTC