Re: Get the good stuff out of PDFs from Joachim Baran on 2013-05-21 (public-semweb-lifesci@w3.org from May 2013)

From: Joachim Baran <joachim.baran@gmail.com>
Date: Tue, 21 May 2013 09:32:11 -0400
To: Alexander Garcia Castro <alexgarciac@gmail.com>, casey.mclaughlin@cci.fsu.edu
Cc: W3C HCLSIG hcls <public-semweb-lifesci@w3.org>, Geraint Duck <geraint.duck@postgrad.manchester.ac.uk>
Message-ID: <CAObSwHWgVh6wiAQFhxr91AWXgRUY+Wq8bXo5ULVV4p3dmXXRfg@mail.gmail.com>

Hello,

On 21 May 2013 09:15, Alexander Garcia Castro <alexgarciac@gmail.com> wrote:

> Do you have tools that may help us to extract information from PDFs?
> send us an email so that we can include them in the hackathon.

[...]
> Would you like to have XML/RDF for scholarly PDFs? What if you could
> have access to the actual content of the PDF for supporting the Web of
> Data?
>
  Have a look at BioInterchange: http://www.biointerchange.org

  One of the supported formats is output from PDFx (
http://pdfx.cs.man.ac.uk/), which we turn into RDF N-Triples. We make use
of Dublin Core and the Semanticscience Integrated Ontology. Geraint (CC'd
here) did the actual implementation.

  We are currently writing up a paper on BioInterchange. I can send you a
draft of it, if you consider using it as a framework for RDFization.

Best wishes,
Joachim

Received on Tuesday, 21 May 2013 13:32:42 UTC