- From: Alexander Garcia Castro <alexgarciac@gmail.com>
- Date: Mon, 6 Oct 2014 11:19:13 -0700
- To: Kingsley Idehen <kidehen@openlinksw.com>
- Cc: Linking Open Data <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <CALAe=OJBA8+oCRxdP_uO2AmFY_yE=v14sOWgt+Pd1WAE_YvZTA@mail.gmail.com>
querying PDFs is NOT simple and requires a lot of work -and usually produces lots of errors. just querying metadata is not enough. As I said before, I understand the PDF as something that gives me a uniform layout. that is ok and necessary, but not enough or sufficient within the context of the web of data and scientific publications. I would like to have the content readily available for mining purposes. if I pay for the publication I should get access to the publication in every format it is available. the content should be presented in a way so that it makes sense within the web of data. if it is the full content of the paper represented in RDF or XML fine. also, I would like to have well annotated content, this is simple and something that could quite easily be part of existing publication workflows. it may also be part of the guidelines for authors -for instance, identify and annotate rhetorical structures. On Mon, Oct 6, 2014 at 11:03 AM, Kingsley Idehen <kidehen@openlinksw.com> wrote: > On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: > >> It's not hard to query PDFs with SPARQL. All you have to do is extract >> the metadata from the document and turn it into RDF, if needed. Lots of >> programs extract and display this metadata already. >> > > Peter, > > Having had 200+ (some-non-rdf-doc} to RDF document transformers built > under my direct guidance, there are issues with your claim above: > > 1. The extractors are platform specific -- AWWW is about platform > agnosticism (I don't want to mandate an OS for experiencing the power of > Linked Open Data transformers / rdfizers) > > 2. It isn't solely about metadata -- we also have raw data inside these > documents confined to Tables, paragraphs of sentences > > 3. If querying a PDF was marginally simple, I would be demonstrating that > using a SPARQL results URL in response to this post :-) > > Possible != Simple and Productive. > > We want to leverage the productivity and simplicity that AWWW brings to > data representation, access, interaction, and integration. > > > -- > Regards, > > Kingsley Idehen > Founder & CEO > OpenLink Software > Company Web: http://www.openlinksw.com > Personal Weblog 1: http://kidehen.blogspot.com > Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen > Twitter Profile: https://twitter.com/kidehen > Google+ Profile: https://plus.google.com/+KingsleyIdehen/about > LinkedIn Profile: http://www.linkedin.com/in/kidehen > Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this > > > -- Alexander Garcia http://www.alexandergarcia.name/ http://www.usefilm.com/photographer/75943.html http://www.linkedin.com/in/alexgarciac
Received on Monday, 6 October 2014 18:20:00 UTC