- From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
- Date: Wed, 2 Jan 2002 19:16:35 -0500
- To: "'Richard A. O'Keefe'" <ok@atlas.otago.ac.nz>, html-tidy@w3.org
Interesting points. In particular, your previous point about the rendering
of HTML _source_ vs. the document itself is well taken.
A couple questions:
1) Can you give us a link to current TeX sources? I'll bet these will be
generally useful.
2) Can you give us a reference to the Unicode sentence break algorithm? I
searched at www.unicode.org, but didn't see it. I did find line break
algorithms, but that is something else.
3) Can you give some guidance on where, within the Tex sources, you would
find the sentence end detection code (and, by implication, how you arrive at
your size estimate for sentence end support)?
In the end, I think it boils down to priorities. I get the impression that
decent HTML handling is more important than source niceties. For example, I
would guess that decent asian language support is more important that
handling two spaces after sentences. Patches are always welcome, however.
take it easy,
Charlie
-----Original Message-----
From: Richard A. O'Keefe [mailto:ok@atlas.otago.ac.nz]
Sent: Monday, December 17, 2001 9:12 PM
To: Todd_Lewis@unc.edu; html-tidy@w3.org; lee@novonyx.com
Subject: Re: don't collapse two spaces at the end of a sentence
I understood the original problem to be that when Tidy
rewraps raw blocks of text, it doesn't do the two-space
two step.
The problem is not that it doesn't _add_ double-spacing,
but that it doesn't _preserve_ double-spacing that is already there.
All the issues you brought up about how to determine the
end of sentences (in various languages no less) have been
worked out for years in TeX, and the code is free for the
taking.
Since HTML 4 and XHTML are based on Unicode, it may be relevant to note that
the Unicode standard includes a method for determining sentence boundaries.
It's not claimed to be perfect, but it works pretty well for a wide range of
languages and scripts.
If it were important enough to some coder to preserve his
two spaces (or "correct" it in HTML from other authors /
sources), then he could take the appropriate part of TeX's
code and incorporate it into Tidy, therefore doubling it's
size (or there abouts -- I'm guessing).
A very wild guess indeed. A better guess would be 0.5%.
Received on Wednesday, 2 January 2002 19:16:42 UTC