- From: Andy Quick <ac.quick@sympatico.ca>
- Date: Sat, 3 Jun 2000 16:36:12 -0400
- To: <html-tidy@w3.org>
I assume that you mean the character 0x0D (ie. '\r') when
you say "^M" because tidy processes "^M" like text.
The line end character for HTML/XML is 0x0A ('\n'). Tidy
strips out control characters (other than '\t' and '\n') from
the input stream. There is no option to treat '\r' like white
space or line-end.
I suggest that you either preprocess your HTML, or put a
hack in StreamInImpl.readChar.
Andy Quick
----- Original Message -----
From: Bernice Maslan <Bernice.Maslan@activeindexing.com>
To: <html-tidy@w3.org>
Sent: May 22, 2000 5:14 PM
Subject: JTidy new line processing
> Hello,
>
> I am running the Java version of HtmlTidy. When the Html input looks
> like the one below , Tidy replaces the ^M with nothing, resulting in two
> separate words being combined (see Tidy output below also). I do not
> know what product was used to create the offending Html. I tried
> setting Word2000 and Clean to yes, but there was no change. Is there
> anything I can configure to make Tidy substitute a space for the ^M?
>
> Thanks,
> Bernice
Received on Saturday, 3 June 2000 16:56:52 UTC