some recent links

Greetings – I’m assuming there are some folks who have subscribed without joining the group. Welcome; introduce yourselves.  If you’re just browsing the archives, consider joining. 

There seem to be a number of projects focused on scraping data out of PDF files, but the process is necessarily heuristic and incomplete.  The relationship is we’re looking for a way of stuffing the results of this kind of analysis back into the PDF file in a way that IS machine readable.


For example, https://docparser.com/blog/getting-started-docparser/


I’ve gotten mixed opinions about whether a new PDF profile (akin to PDF/X, PDF/UA, PDF/E) call it  PDF/D “PDF with data”  
A file can be both PDF/D and PDF/UA (and even all three with PDF/A-3).  

So you could use DocParser (or some other process) and generate PDF/D versions.

PDF/D would have several optional features   “Text Available” (Yes / No), “Tables” (with named units/interpretations) and possibly images.

 

Received on Saturday, 27 August 2016 06:28:28 UTC