Need help with TIDY Configuration File

Hello,



I’m using tidy for creating a wellformed HTML output from a loosely
organized HTML file. The HTML files has many closing tags missing. Here’s my
sample HTML i/p:



*HTML I/P*

<p class="0">A

<p class="1"><em class="bf">ACCOUNTING BASIS</em>

<p class="2">Taxation, <cite class="section">3.3.3

<p class="1"><em class="bf">ACCRUAL BASIS ACCOUNTING,</em> <cite
class="section">3.3.3

<p class="1"><em class="bf">AFFILIATED SERVICES GROUPS</em>

<p class="2">Taxation, <cite class="section">3.3.5

<p class="1"><em class="bf">ANCILLARY SERVICES</em>

<p class="2">Reimbursement

<p class="3">Payment methodology

<p class="4">Covered ancillary services, <cite class="section">5.1.2.2

<p class="1"><em class="bf">ANESTHESIOLOGY</em>

<p class="2">Anti-kickback statute

<p class="3">Case law and other guidance, <cite class="section">2.4.6.4



I’ve defined following parameters in tidy.config file:



*Config File:*



add-xml-decl:true

#output-xhtml:true

doctype:omit

hide-comments:yes

preserve-entities:yes

uppercase-tags:0

# DO NOT specify input encoding here unless it never,ever changes.

output-encoding:utf8

word-2000:false

# bare: replaces nbsps with regular spaces as a side-effect

# these nbsps are needed for clues so bare should be left false.

bare:true

enclose-text:yes

numeric-entities:yes

# clean: strips surplus tags from ms word originating docs.

# clean consolidates similar styles and uses references to them.

# trades document size for ease of parsing it -- leave this false.

clean:true

hide-comments:true

# wrap: zero if you want to disable line wrapping

wrap:0

# quote-nbsp: output non-breaking space characters as entities

quote-nbsp:false

show-warnings:false

#





*My O/p looks like this:*



<p class="0">A</p>

<p class="1"><em class="bf">ACCOUNTING BASIS</em></p>

<p class="2">Taxation, <cite class="section">3.3.3</cite></p>

<p class="1"><cite class="section"><em class="bf">ACCRUAL BASIS

ACCOUNTING,</em> <cite class="section">3.3.3</cite></cite></p>

<p class="1"><cite class="section"><em class="bf">AFFILIATED

SERVICES GROUPS</em></cite></p>

<p class="2"><cite class="section">Taxation, <cite class=

"section">3.3.5</cite></cite></p>

<p class="1"><cite class="section"><em class="bf">ANCILLARY

SERVICES</em></cite></p>

<p class="2"><cite class="section">Reimbursement</cite></p>

<p class="3"><cite class="section">Payment methodology</cite></p>

<p class="4"><cite class="section">Covered ancillary services,

<cite class="section">5.1.2.2</cite></cite></p>

<p class="1"><cite class="section"><em
class="bf">ANESTHESIOLOGY</em></cite></p>

<p class="2"><cite class="section">Anti-kickback statute</cite></p>

<p class="3"><cite class="section">Case law and other guidance,

<cite class="section">2.4.6.4</cite></cite></p>



You can see the unwanted <cite> tags getting added in the data.



I want the o/p to appear as follows:

* *

*Required O/p:*



<p class="0">A</p>

<p class="1"><em class="bf">ACCOUNTING BASIS</em></p>

<p class="2">Taxation, <cite class="section">3.3.3</cite></p>

<p class="1"><em class="bf">ACCRUAL BASIS ACCOUNTING,</em> <cite
class="section">3.3.3</cite></p>

<p class="1"><em class="bf">AFFILIATED SERVICES GROUPS</em></p>

<p class="2">Taxation, <cite class="section">3.3.5</cite></cite></p>

<p class="1"><em class="bf">ANCILLARYSERVICES</em></p>

<p class="2">Reimbursement</p>

<p class="3">Payment methodology</p>

<p class="4">Covered ancillary services,<cite
class="section">5.1.2.2</cite></p>

<p class="1"><em class="bf">ANESTHESIOLOGY</em></p>

<p class="2”>Anti-kickback statute</p>

<p class="3">Case law and other guidance,<cite
class="section">2.4.6.4</cite></p>



Please advise the changes in the config file to get the above required o/p.
Thanks!!



Thanks in advance for your help!!





Regards,

Nilesh Chavan.

**

*Cell:   +1 (937) 301 0575*

Received on Tuesday, 21 June 2011 22:45:09 UTC