W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2001

Re: don't collapse two spaces at the end of a sentence

From: Lee Passey <lee@novonyx.com>
Date: Mon, 17 Dec 2001 12:12:37 -0700
Message-ID: <3C1E43A5.355DBCE7@novonyx.com>
To: "Todd M. Lewis" <utoddl@email.unc.edu>
CC: html-tidy@w3.org
My undergraduate degree is in a foreign language, so I find this type of
discussion extremely fascinating, although it is really off-topic.

Earlier, Gerhard Scholz suggested that two spaces could be placed at the
end of sentences by using "dot&nbsp;space".  This will work, and there
should be nothing in tidy that will prevent or alter this usage.  (If
there is, please send me an example, and I will work at fixing it.)  But
should tidy have an option to fix this up where it is not present?

More to the point, can tidy fix this up where it is not present?  

The problem is, what is a sentence?  From a lexigraphic standpoint, my
first reaction would be to define the end of a sentence as a
non-whitepace, followed by a period, an exclamation mark, or a question
mark, followed by whitespace.  But what about abbreviations like Mr.,
Ms. or Dr., which should be followed by a single space?  And what about
sentences "where the punctuation is encapsulated in quotation marks?" 
And what would be the effect of phrase elements and font style elements
(e.g. <em> or <i>)?  And I seem to recall from my typing class that
there should be two spaces after colons as well.  Should be include
rules for that too?

I conclude that the problem is more appropriately discussed in forums on
natural language processing or artificial intelligence.  I think that
tidy should do nothing to prevent "two space or die" bigots from
creating html which reflects their bias, but I don't think that it can
or should "fix" text in which it is not already present.

"Todd M. Lewis" wrote:
> "Richard A. O'Keefe" wrote:
> >
> > In this mailing list, we're NOT talking about how the text ends up being
> > presented.  We're talking about how the HTML source form is tidied, and
> > arguments from "modern" typography (really based on mediaeval scribes'
> > desire to cram as many words as they could onto their extremely expensive
> > writing medium) are entirely beside the point.
> Doesn't TeX do something more involved with end-of-sentence spacing?
> How much bloat would it add to Tidy to make it smarter about punctuation
> at the end of sentences, like TeX?  For the record, I used to be a "two
> spaces or die" biggot, but I got over it.  Still, it would be nice if
> Tidy allowed as much stylistic choice as possible in the internal
> layout...
> --
>    +------------------------------------------------------------+
>   / Todd_Lewis@unc.edu              http://www.unc.edu/~utoddl /
>  /(919) 962-5273               Lord, give me patience... Now! /
> +------------------------------------------------------------+
Received on Monday, 17 December 2001 14:08:03 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:51 UTC