Hello,
On 21 May 2013 09:15, Alexander Garcia Castro <alexgarciac@gmail.com> wrote:
> Do you have tools that may help us to extract information from PDFs?
> send us an email so that we can include them in the hackathon.
[...]
> Would you like to have XML/RDF for scholarly PDFs? What if you could
> have access to the actual content of the PDF for supporting the Web of
> Data?
>
Have a look at BioInterchange: http://www.biointerchange.org
One of the supported formats is output from PDFx (
http://pdfx.cs.man.ac.uk/), which we turn into RDF N-Triples. We make use
of Dublin Core and the Semanticscience Integrated Ontology. Geraint (CC'd
here) did the actual implementation.
We are currently writing up a paper on BioInterchange. I can send you a
draft of it, if you consider using it as a framework for RDFization.
Best wishes,
Joachim