W3C home > Mailing lists > Public > www-international@w3.org > October to December 2006

Re: Unix cmd line utility for Multibyte PDF -> Text

From: Michael Monaghan <Michael.Monaghan@Sun.COM>
Date: Mon, 23 Oct 2006 12:01:11 -0400
To: cstrobbe <Christophe.Strobbe@esat.kuleuven.be>
Cc: www-international@w3.org
Message-id: <453CE747.1070003@sun.com>

fyi - PDFBox did not work [for me], but I was then referred to 'xpdf' at 
http://www.foolabs.com/xpdf/ - and it works very nicely.

Thanks,

~mm

cstrobbe wrote:
> Hi Michael,
> 
> 
> Quoting Michael Monaghan <Michael.Monaghan@Sun.COM>:
> 
> 
>>Hi,
>>
>>I need a pdf -> text command line utility for Unix/Solaris that
>>won't corrupt non-ASCII characters.
> 
> 
> 
> A few years ago I used PDFBox, a Java PDF library, to extract text from 
> PDF (http://www.pdfbox.org/). I seem to remember that it also worked 
> for non-ASCII characters.
> 
> Best regards,
> 
> Christophe
> 
Received on Monday, 23 October 2006 16:01:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:08 GMT