- From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
- Date: Wed, 2 Jan 2002 19:16:35 -0500
- To: "'Richard A. O'Keefe'" <ok@atlas.otago.ac.nz>, html-tidy@w3.org
Interesting points. In particular, your previous point about the rendering of HTML _source_ vs. the document itself is well taken. A couple questions: 1) Can you give us a link to current TeX sources? I'll bet these will be generally useful. 2) Can you give us a reference to the Unicode sentence break algorithm? I searched at www.unicode.org, but didn't see it. I did find line break algorithms, but that is something else. 3) Can you give some guidance on where, within the Tex sources, you would find the sentence end detection code (and, by implication, how you arrive at your size estimate for sentence end support)? In the end, I think it boils down to priorities. I get the impression that decent HTML handling is more important than source niceties. For example, I would guess that decent asian language support is more important that handling two spaces after sentences. Patches are always welcome, however. take it easy, Charlie -----Original Message----- From: Richard A. O'Keefe [mailto:ok@atlas.otago.ac.nz] Sent: Monday, December 17, 2001 9:12 PM To: Todd_Lewis@unc.edu; html-tidy@w3.org; lee@novonyx.com Subject: Re: don't collapse two spaces at the end of a sentence I understood the original problem to be that when Tidy rewraps raw blocks of text, it doesn't do the two-space two step. The problem is not that it doesn't _add_ double-spacing, but that it doesn't _preserve_ double-spacing that is already there. All the issues you brought up about how to determine the end of sentences (in various languages no less) have been worked out for years in TeX, and the code is free for the taking. Since HTML 4 and XHTML are based on Unicode, it may be relevant to note that the Unicode standard includes a method for determining sentence boundaries. It's not claimed to be perfect, but it works pretty well for a wide range of languages and scripts. If it were important enough to some coder to preserve his two spaces (or "correct" it in HTML from other authors / sources), then he could take the appropriate part of TeX's code and incorporate it into Tidy, therefore doubling it's size (or there abouts -- I'm guessing). A very wild guess indeed. A better guess would be 0.5%.
Received on Wednesday, 2 January 2002 19:16:42 UTC