Re: Are there any open source tools that work with xproc that convert html to well formatted text?

Okay, thanks very much

On Tue, Feb 1, 2011 at 10:20 AM, mozer <xmlizer@gmail.com> wrote:
> This looks like a XSLT transformation to me
>
> Xmlizer
>
> On Tue, Feb 1, 2011 at 11:12 AM, Alex Muir <alex.g.muir@gmail.com> wrote:
>> I'm working with some html documents the style of which looks like say
>> a straight forward word document which when I tried saving as text
>> from firefox looked a lot like the HTML version in terms of the
>> spacing of the text content,, except some tables which were garbage.
>> So a subsection in the HTML was still easily determined to be a
>> subsection in the text because the presentational formatting specified
>> in the HTML was preserved in the text output.
>>
>> I've found more success thus far identifying the different textual
>> elements of a text document than HTML perhaps because HTML has so many
>> possibilities of layouts whereas text is pretty simple thing to parse
>> out and identify where a table is or where a section, subsection is...
>>
>> Does that make sense regarding the well formatted?
>>
>> Alex
>>
>>
>> On Tue, Feb 1, 2011 at 9:54 AM, mozer <xmlizer@gmail.com> wrote:
>>> oups read too fast : I read "well formed"
>>>
>>> What do you mean by well formatted text representation ?
>>>
>>> Xmlizer
>>>
>>> On Tue, Feb 1, 2011 at 10:53 AM, mozer <xmlizer@gmail.com> wrote:
>>>> p:unescape-markup
>>>> or
>>>> p:http-request should do that
>>>>
>>>> Xmlizer
>>>>
>>>> On Tue, Feb 1, 2011 at 10:49 AM, Alex Muir <alex.g.muir@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm interested to have a step in a pipeline that converts HTML to a
>>>>> well formatted text representation.
>>>>>
>>>>> Are there any open source tools that do that that fit into xproc?
>>>>>
>>>>> Thanks
>>>>>
>>>>> --
>>>>> Alex
>>>>> -----
>>>>> Currently:
>>>>> Freelance Software Engineer 6+ yrs exp
>>>>>
>>>>> Previously:
>>>>> https://sites.google.com/a/utg.edu.gm/alex/
>>>>>
>>>>>
>>>>> A Bafila, is two rivers flowing together as one:
>>>>> http://www.facebook.com/pages/Bafila/125611807494851
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Alex
>> -----
>> Currently:
>> Freelance Software Engineer 6+ yrs exp
>>
>> Previously:
>> https://sites.google.com/a/utg.edu.gm/alex/
>>
>>
>> A Bafila, is two rivers flowing together as one:
>> http://www.facebook.com/pages/Bafila/125611807494851
>>
>



-- 
Alex
-----
Currently:
Freelance Software Engineer 6+ yrs exp

Previously:
https://sites.google.com/a/utg.edu.gm/alex/


A Bafila, is two rivers flowing together as one:
http://www.facebook.com/pages/Bafila/125611807494851

Received on Tuesday, 1 February 2011 10:48:41 UTC