W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2000

Re: JTidy new line processing

From: Rzepa, Henry <h.rzepa@ic.ac.uk>
Date: Wed, 14 Jun 2000 18:25:26 +0100
Message-ID: <3947C006.82B4A51F@ic.ac.uk>
To: html-tidy@w3.org, g.gkoutos@ic.ac.uk


> 
> >>I am running the Java version of HtmlTidy.  When the Html input looks
> >>like the one below , Tidy replaces the ^M with nothing, resulting in two
> >>separate words being combined (see Tidy output below also).  I do not
> >>know what product was used to create the offending Html. 

We appear to have a simular, but I suspect bug of different origin

When JTidy is called through  its main method i.e

java org.w3c.tidy.Tidy -asxml  file.html

the output is fine.

when it is called through its parser method i.e

Tidy tidy = new Tidy();
          tidy.setMakeClean(true);
//	  tidy.setXmlTags(true);
          tidy.setXHTML(true);
tidy.parse(in, out);


the output has deleted spaces,

i.e 

density was calculatedboth for anion A and B and the most


instead of 

density was calculated both for anion A and B and the most

This space is NOT at the end of line markers, its in the middle.
In a file of say about  10,000 characters, it appears perhaps  50-100
spaces will be deleted. We think its only spaces are lost, and not
other characters.

If  tidy.setXmlTags(true); is uncommented 
then it produces extra line throws
but does not delete gaps.

I might add that other versions of Tidy process the same file with no 
problem.

since calculatedboth  has a different meaning from calculated both,
we consider this a serious problem.

Any comments on the origin of the problem?
Received on Wednesday, 14 June 2000 13:25:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT