Re: Get the good stuff out of PDFs

Hello,

On 21 May 2013 09:15, Alexander Garcia Castro <alexgarciac@gmail.com> wrote:

> Do you have tools that may help us to extract information from PDFs?
> send us an email so that we can include them in the hackathon.

[...]
> Would you like to have XML/RDF for scholarly PDFs? What if you could
> have access to the actual content of the PDF for supporting the Web of
> Data?
>
  Have a look at BioInterchange: http://www.biointerchange.org

  One of the supported formats is output from PDFx (
http://pdfx.cs.man.ac.uk/), which we turn into RDF N-Triples. We make use
of Dublin Core and the Semanticscience Integrated Ontology. Geraint (CC'd
here) did the actual implementation.

  We are currently writing up a paper on BioInterchange. I can send you a
draft of it, if you consider using it as a framework for RDFization.

Best wishes,
Joachim

Received on Tuesday, 21 May 2013 13:32:42 UTC