W3C home > Mailing lists > Public > public-html-comments@w3.org > March 2012

Proposal for new tag to mark sentences.

From: Thomas A. Fine <fine@head.cfa.harvard.edu>
Date: Mon, 12 Mar 2012 10:16:36 -0400 (EDT)
To: public-html-comments@w3.org
Message-Id: <20120312141636.4708FD4D4C6@bugs.localhost>
I would like to propose a new tag to fill a glaring hole in HTML.

There is no tag for marking sentence structure in HTML.  While we
have always had the <P> tag, possibly the first and most important
tag, used to mark off paragraphs, there is no similar tag for
sentences.

One might argue that sentences are already marked with punctuation,
and are simply a part of the content, not the layout.  However this
is clearly not true.  A period is a piece of punctuation with many
uses, and it may or may not mark a sentence.  And many people are
interested in providing formatting specific to sentences.  Throughout
the 800+ years of printing, printers have mostly chosen to mark
sentences off with additional space.  While I'm not hear to take
sides on whether or not there should be additional space, it's clear
that it's a layout feature missing from HTML.

One could argue that there are other options are already available
that can accomplish this.  However by far the most commonly recommended
option is &nbsp; which is clearly incorrect, because it wrongly
affects line wrapping, and because mixing a space and a NBSP to
allow for line wrapping can lead to blanks at the beginning of the
line.  There are other more appropriate spaces, like &emsp;, &ensp;,
and &thinsp; which would be more appropriate to use.  This is
appropriate and correct for many situations, however it is not the
fully flexible layout control one should expect in our modern CSS
world.  Ideally for those who desire it, they should be able to
mark off sentences and finely control their layout using CSS.

This approach is also not necessarily correct in terms of cut-and-paste
behavior: someone that feels that their inter-sentence spacing
should be large, e.g. &emsp;, would probably hope that upon
cut-and-paste this extra space would map to two (or more) space
characters.  However it's clear that &emsp; must translate to only
one single character.  A dedicated sentence tag would allow web
designers to offer recommendations on mapping between sentences and
number of spaces, and allow users to override this setting to their
own taste.

The approach of using space entities also provides no mechanism for
dynamic control.  Only CSS could allow web page viewers to adjust
the inter-sentence spacing to their own taste.

Proper CSS control can be accomplished to a fair degree with the
<SPAN> tag, however this is still suboptimal for a couple of reasons.
First, it isn't clear that this is correct behavior.  If you adjust
the padding-right for all of your sentence spans, it isn't clear
if a browser's wrap margin will be shifted because of this when a
sentence ends near the wrap margin.  Second, without a standard,
there is no "hook" for software developers to build sentence handling
into their user interface.  That is, if there were a sentence tag,
then HTML generators could offer help to the user in detecting and
tagging sentences, whereas without a standard this is unlikely to
happen.  Use of the SPAN also does not properly address the
cut-and-paste issue I discussed above.

I have a web page formatted using the SPAN tag, along with javascript
used to allow the end-user to adjust sentence spacing:
http://hea-www.harvard.edu/~fine/Tech/html-sentences.html
(I discuss many of these issues on that web page.)

Perhaps the biggest reason for adding this tag is a political one.
There is an ongoing debate about whether or not the spacing between
sentences should be different than spacing between words.  It's not
my intention here to take sides.  More importantly, HTML should not
take sides, but the lack of a tag for marking sentence structure
does just that.  Many people naively point at HTML's space collapsing
behavior as some kind of proof that it is wrong to add extra space
between sentences.  But this should be a decision of the web
designers, HTML itself should be agnostic on the issue.  The only
way to do this is to offer a functional mechanism for those who
want to use it.

The record shows that in the early nineties when HTML designers
looked at the issue of space between sentences, they should just
use word-spacing, not because it was "correct", but because attempting
to detect sentences was just too much trouble.  At the time, the
goals of HTML were to be small and simple; just a structure for
accessing other document types.  However the purpose of HTML has
changed radically since then.

In summary these are the arguments in favor of a sentence tag.
  * HTML should not take sides on this layout issue.
  * &nbsp; is clearly the wrong solution.
  * &ensp; and other spaces are usable in some cases, but incomplete.
  * Manipulating SPAN does not offer a clearly correct solution.
  * SPAN does not give software developers a standard that could be used
    in user interface design.
  * There is no solution that addresses the cut-and-paste issue.

Thank you,

tom

Thomas A. Fine
Received on Tuesday, 13 March 2012 12:12:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 13 March 2012 12:12:03 GMT