W3C home > Mailing lists > Public > public-html@w3.org > December 2012

Re: The missing Sentence tag

From: Thomas A. Fine <fine@head.cfa.harvard.edu>
Date: Thu, 06 Dec 2012 18:38:24 -0500
Message-ID: <50C12C70.5050701@head.cfa.harvard.edu>
To: Lee Kowalkowski <lee.kowalkowski@googlemail.com>
CC: public-html <public-html@w3.org>
On 12/6/12 5:16 PM, Lee Kowalkowski wrote:
> On 6 December 2012 15:46, Thomas A. Fine <fine@head.cfa.harvard.edu
> <mailto:fine@head.cfa.harvard.edu>> wrote:
>
>
>     But since I have an interest in both formatting and semantics, a tag
>     is the best choice.
>
>
> What would the semantic advantage be?  When I teach HTML, I like to be
> pragmatic.  So I'm talking to budding web developers, what do I tell
> them about this element?  Particularly, when and why should they use it.

Well obviously there is the ability to format sentences as separate CSS 
elements.  And as a semantic tag, machine translators and reading 
software could more easily and accurately detect sentence structure and 
use that as an aid.  Converting software into other formats where 
sentences receive particular handling would also be improved.

And they should be combined because at content creation time, it's the 
same problem and shouldn't have two solutions.  And if you only care 
about one of the two things (semantics or formatting) you still 
automatically get the other one for free.

>     One of the myths mentioned in the article you link to says that
>     because HTML can't do wide sentence spacing this is proof that wide
>     sentence spacing should not be used.
>
> Seems to be a recursive argument I agree, not very convincing. Although
> the proportional spacing one is. Isn't it?

Since you asked... no. It's wrong on two levels.  First, movable type 
printing has always used proportional fonts, and from the mid 1600s 
through the 1800s wide sentence spacing was standard (and in some ways, 
before that too).  This was even true with handwriting.  So when 
typewriters (with monospaced fonts) began to be sold in the 1870s, the 
"two space" rule just sort of automatically followed.  There's no hint 
of this nonsense about needing more space in the historical record, 
people just did it because that's what they'd always done in handwriting 
and in typesetting.  And for 60-80 more years after the invention of the 
typewriter and the introduction of monospaced fonts, wider sentence 
spacing remained the standard in books and newspapers.  It's pretty 
clear that the typewriter had nothing to do with a transition to word 
spacing between sentences.

Second, it doesn't even make logical sense.  In a monospaced font, a 
space is as big as a letter, while in a proportional font, the standard 
space is about half the size of your average letter.  If anything, this 
is an argument that visually, proportional fonts need to use more space 
characters than monospaced fonts to achieve the same look.  And in fact 
this was the standard.  When wide sentence spacing was common practice, 
a word space was typically 3-per-em or even 4-per-em, and a sentence 
space was most often one em, making it 3 to 4 times larger than a word 
space, not merely twice as large as with the typewriter convention.  The 
issue is even more ridiculous when you consider the size of the period 
in monospaced versus proportional fonts.

>     But of course the reason HTML can't conveniently do wide sentence
>     spacing is more of a historical accident and a matter of laziness,
>     and is the very reason I'm here right now.
>
>
> It could just be a side effect of ignoring extraneous whitespace, true.
>   But the proportional font has been around for longer than HTML, unless
> I'm mistaken.

At one point one of the early web browsers used a Motif library option 
to autodetect sentences and render them with extra space.  After a 
discussion it was decided that this didn't work in all cases, and should 
not be used.  It was also decided that HTML should ignore the sentence 
spacing issue, as it wasn't worth the effort for HTML which at that time 
was intended to be a simple document framework for accessing other 
document types.  It's there somewhere in the www-talk archives.

     tom
Received on Thursday, 6 December 2012 23:38:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 6 December 2012 23:38:56 GMT