RE: XML Tidy? from Reitzel, Charlie on 2001-06-20 (html-tidy@w3.org from April to June 2001)

From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
Date: Wed, 20 Jun 2001 17:59:58 -0400
To: "'Ignacio Vazquez-Abrams'" <ignacio@openservices.net>, html-tidy@w3.org
Message-ID: <B5C79DDBC655D311B6BD0008C7E64D76013C1634@exchange.arrakisplanet.com>

Looks good.  Looks like the pretty printer theory worked out ok.  I put this
into the "Feature request" tracker on the SourceForge project (and attached
your diff), so it's in the queue.

For all interested parties, you can view the feature list Terry has culled
from this mailing list at 
http://sourceforge.net/tracker/index.php?group_id=27659&atid=390966

The more feedback we get about these items, the better (including priority,
sanity checking, potential conflicts, refinements, "You want it to do
what?", etc. etc.).

Question: given the input
-- INPUT -- 
<html> 
<head> 
<title>Foo!</title> 
</head> 
<body> 
<p>Bar!</p> 
</body> 
</html> 
------------ 

Do you want
-- OUTPUT A -- 
<p>Bar!</p> 
-------------- 

or b)
-- OUTPUT B -- 
<body>
<p>Bar!</p> 
</body>
--------------

Your diffs look like you want b).  If so, why?  I'd think you'd want to drop
the <body> itself so that the document contents, for example, could be
dropped into a cell in a layout table.

Charlie

-----Original Message-----
From: Ignacio Vazquez-Abrams [mailto:ignacio@openservices.net]
Sent: Wednesday, June 20, 2001 10:13 AM
To: html-tidy@w3.org
Subject: RE: XML Tidy?

On Tue, 19 Jun 2001, Reitzel, Charlie wrote:

> Great idea.  I'd try a somewhat different tact, tho.  Rather than mess w/
> the parser _or_ your input.  I'd just feed the prospective <body> contents
> to Tidy as is.  Let the parser merrily add
> <head><title></title></head><body> your stuff! </body>.  Then, if the
> --body-only option is set, call  a new PPrintContent() function in
pprint.c.
> This function will call PPrintTree() for each member of the body->content
> node list.
>
> The important point here is you can safely add functionality without
> changing w/ the inner workings of the parser.  Even better, it will report
> the line numbers correctly for any errors/warnings it emits.
>
> take it as easy as you can stand it (which is qualified by noting that I
am
> an uptight east coaster.  It's one those "do as I say, not as I do" kind
of
> things),
>
> Charlie

This one should do it then. No more segfaults, but I did have to muck with
the
parser in order to suppress the "no title" error.

A nice side bonus is that it will gladly take a normal HTML page and chop
out
everything but the body, which means I can get rid of the part in the
applcation that does it manually when importing a file.

-- 
Ignacio Vazquez-Abrams  <ignacio@openservices.net>

Received on Wednesday, 20 June 2001 17:59:19 UTC