W3C home > Mailing lists > Public > xproc-dev@w3.org > February 2011

Re: Are there any open source tools that work with xproc that convert html to well formatted text?

From: mozer <xmlizer@gmail.com>
Date: Tue, 1 Feb 2011 11:20:02 +0100
Message-ID: <AANLkTi=vdypZNkOH3yXKyrZnb+Wjwo8MctdLjKbhnWAQ@mail.gmail.com>
To: Alex Muir <alex.g.muir@gmail.com>
Cc: XProc Dev <xproc-dev@w3.org>
This looks like a XSLT transformation to me

Xmlizer

On Tue, Feb 1, 2011 at 11:12 AM, Alex Muir <alex.g.muir@gmail.com> wrote:
> I'm working with some html documents the style of which looks like say
> a straight forward word document which when I tried saving as text
> from firefox looked a lot like the HTML version in terms of the
> spacing of the text content,, except some tables which were garbage.
> So a subsection in the HTML was still easily determined to be a
> subsection in the text because the presentational formatting specified
> in the HTML was preserved in the text output.
>
> I've found more success thus far identifying the different textual
> elements of a text document than HTML perhaps because HTML has so many
> possibilities of layouts whereas text is pretty simple thing to parse
> out and identify where a table is or where a section, subsection is...
>
> Does that make sense regarding the well formatted?
>
> Alex
>
>
> On Tue, Feb 1, 2011 at 9:54 AM, mozer <xmlizer@gmail.com> wrote:
>> oups read too fast : I read "well formed"
>>
>> What do you mean by well formatted text representation ?
>>
>> Xmlizer
>>
>> On Tue, Feb 1, 2011 at 10:53 AM, mozer <xmlizer@gmail.com> wrote:
>>> p:unescape-markup
>>> or
>>> p:http-request should do that
>>>
>>> Xmlizer
>>>
>>> On Tue, Feb 1, 2011 at 10:49 AM, Alex Muir <alex.g.muir@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I'm interested to have a step in a pipeline that converts HTML to a
>>>> well formatted text representation.
>>>>
>>>> Are there any open source tools that do that that fit into xproc?
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> Alex
>>>> -----
>>>> Currently:
>>>> Freelance Software Engineer 6+ yrs exp
>>>>
>>>> Previously:
>>>> https://sites.google.com/a/utg.edu.gm/alex/
>>>>
>>>>
>>>> A Bafila, is two rivers flowing together as one:
>>>> http://www.facebook.com/pages/Bafila/125611807494851
>>>>
>>>>
>>>
>>
>
>
>
> --
> Alex
> -----
> Currently:
> Freelance Software Engineer 6+ yrs exp
>
> Previously:
> https://sites.google.com/a/utg.edu.gm/alex/
>
>
> A Bafila, is two rivers flowing together as one:
> http://www.facebook.com/pages/Bafila/125611807494851
>
Received on Tuesday, 1 February 2011 10:20:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 1 February 2011 10:20:35 GMT