- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Mon, 06 Oct 2014 15:14:47 -0400
- To: Linking Open Data <public-lod@w3.org>
- CC: "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <5432EA27.4010508@openlinksw.com>
On 10/6/14 2:19 PM, Alexander Garcia Castro wrote: > querying PDFs is NOT simple and requires a lot of work -and usually > produces lots of errors. Yes, I believe I indicated that in my response to Peter i.e., it isn't simple or productive. > just querying metadata is not enough. Yes, I said that too i.e., we want access to raw data. > As I said before, I understand the PDF as something that gives me a > uniform layout. that is ok and necessary, but not enough or sufficient > within the context of the web of data and scientific publications. I > would like to have the content readily available for mining purposes. > if I pay for the publication I should get access to the publication in > every format it is available. the content should be presented in a way > so that it makes sense within the web of data. if it is the full > content of the paper represented in RDF or XML fine. also, I would > like to have well annotated content, this is simple and something that > could quite easily be part of existing publication workflows. it may > also be part of the guidelines for authors -for instance, identify and > annotate rhetorical structures. Modulo any confusing typos in my earlier posts, I don't see where we are disagreeing :-) Kingsley > > On Mon, Oct 6, 2014 at 11:03 AM, Kingsley Idehen > <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote: > > On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: > > It's not hard to query PDFs with SPARQL. All you have to do > is extract the metadata from the document and turn it into > RDF, if needed. Lots of programs extract and display this > metadata already. > > > Peter, > > Having had 200+ (some-non-rdf-doc} to RDF document transformers > built under my direct guidance, there are issues with your claim > above: > > 1. The extractors are platform specific -- AWWW is about platform > agnosticism (I don't want to mandate an OS for experiencing the > power of Linked Open Data transformers / rdfizers) > > 2. It isn't solely about metadata -- we also have raw data inside > these documents confined to Tables, paragraphs of sentences > > 3. If querying a PDF was marginally simple, I would be > demonstrating that using a SPARQL results URL in response to this > post :-) > > Possible != Simple and Productive. > > We want to leverage the productivity and simplicity that AWWW > brings to data representation, access, interaction, and integration. > > > -- > Regards, > > Kingsley Idehen > Founder & CEO > OpenLink Software > Company Web: http://www.openlinksw.com > Personal Weblog 1: http://kidehen.blogspot.com > Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen > <http://www.openlinksw.com/blog/%7Ekidehen> > Twitter Profile: https://twitter.com/kidehen > Google+ Profile: https://plus.google.com/+KingsleyIdehen/about > LinkedIn Profile: http://www.linkedin.com/in/kidehen > Personal WebID: > http://kingsley.idehen.net/dataspace/person/kidehen#this > > > > > > -- > Alexander Garcia > http://www.alexandergarcia.name/ > http://www.usefilm.com/photographer/75943.html > http://www.linkedin.com/in/alexgarciac > -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Monday, 6 October 2014 19:15:17 UTC