- From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
- Date: Tue, 18 Dec 2001 15:12:03 +1300 (NZDT)
- To: Todd_Lewis@unc.edu, html-tidy@w3.org, lee@novonyx.com
I understood the original problem to be that when Tidy rewraps raw blocks of text, it doesn't do the two-space two step. The problem is not that it doesn't _add_ double-spacing, but that it doesn't _preserve_ double-spacing that is already there. All the issues you brought up about how to determine the end of sentences (in various languages no less) have been worked out for years in TeX, and the code is free for the taking. Since HTML 4 and XHTML are based on Unicode, it may be relevant to note that the Unicode standard includes a method for determining sentence boundaries. It's not claimed to be perfect, but it works pretty well for a wide range of languages and scripts. If it were important enough to some coder to preserve his two spaces (or "correct" it in HTML from other authors/sources), then he could take the appropriate part of TeX's code and incorporate it into Tidy, therefore doubling it's size (or there abouts -- I'm guessing). A very wild guess indeed. A better guess would be 0.5%.
Received on Monday, 17 December 2001 21:12:17 UTC