- From: <vojtech.toman@emc.com>
- Date: Tue, 1 Feb 2011 06:24:07 -0500
- To: <xproc-dev@w3.org>
If you are in an *nix environment, you can also try using p:exec in combination with the lesspipe.sh script (a preprocessor filter for less capable of basic HTML "rendering"). But maybe I have completely misunderstood your requirement. Regards, Vojtech -- Vojtech Toman Consultant Software Engineer EMC | Information Intelligence Group vojtech.toman@emc.com http://developer.emc.com/xmltech > -----Original Message----- > From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On > Behalf Of Alex Muir > Sent: Tuesday, February 01, 2011 11:12 AM > To: mozer > Cc: XProc Dev > Subject: Re: Are there any open source tools that work with xproc that > convert html to well formatted text? > > I'm working with some html documents the style of which looks like say > a straight forward word document which when I tried saving as text > from firefox looked a lot like the HTML version in terms of the > spacing of the text content,, except some tables which were garbage. > So a subsection in the HTML was still easily determined to be a > subsection in the text because the presentational formatting specified > in the HTML was preserved in the text output. > > I've found more success thus far identifying the different textual > elements of a text document than HTML perhaps because HTML has so many > possibilities of layouts whereas text is pretty simple thing to parse > out and identify where a table is or where a section, subsection is... > > Does that make sense regarding the well formatted? > > Alex > > > On Tue, Feb 1, 2011 at 9:54 AM, mozer <xmlizer@gmail.com> wrote: > > oups read too fast : I read "well formed" > > > > What do you mean by well formatted text representation ? > > > > Xmlizer > > > > On Tue, Feb 1, 2011 at 10:53 AM, mozer <xmlizer@gmail.com> wrote: > >> p:unescape-markup > >> or > >> p:http-request should do that > >> > >> Xmlizer > >> > >> On Tue, Feb 1, 2011 at 10:49 AM, Alex Muir <alex.g.muir@gmail.com> > wrote: > >>> Hi, > >>> > >>> I'm interested to have a step in a pipeline that converts HTML to a > >>> well formatted text representation. > >>> > >>> Are there any open source tools that do that that fit into xproc? > >>> > >>> Thanks > >>> > >>> -- > >>> Alex > >>> ----- > >>> Currently: > >>> Freelance Software Engineer 6+ yrs exp > >>> > >>> Previously: > >>> https://sites.google.com/a/utg.edu.gm/alex/ > >>> > >>> > >>> A Bafila, is two rivers flowing together as one: > >>> http://www.facebook.com/pages/Bafila/125611807494851 > >>> > >>> > >> > > > > > > -- > Alex > ----- > Currently: > Freelance Software Engineer 6+ yrs exp > > Previously: > https://sites.google.com/a/utg.edu.gm/alex/ > > > A Bafila, is two rivers flowing together as one: > http://www.facebook.com/pages/Bafila/125611807494851 >
Received on Tuesday, 1 February 2011 11:24:59 UTC