W3C home > Mailing lists > Public > semantic-web@w3.org > October 2014

Re: scientific publishing process (was Re: Cost and access)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Mon, 06 Oct 2014 15:14:47 -0400
Message-ID: <5432EA27.4010508@openlinksw.com>
To: Linking Open Data <public-lod@w3.org>
CC: "semantic-web@w3.org" <semantic-web@w3.org>
On 10/6/14 2:19 PM, Alexander Garcia Castro wrote:
> querying PDFs is NOT simple and requires a lot of work -and usually 
> produces lots of errors.

Yes, I believe I indicated that in my response to Peter i.e., it isn't 
simple or productive.

> just querying metadata is not enough.

Yes, I said that too i.e., we want access to raw data.

> As I said before, I understand the PDF as something that gives me a 
> uniform layout. that is ok and necessary, but not enough or sufficient 
> within the context of the web of data and scientific publications. I 
> would like to have the content readily available for mining purposes. 
> if I pay for the publication I should get access to the publication in 
> every format it is available. the content should be presented in a way 
> so that it makes sense within the web of data.  if it is the full 
> content of the paper represented in RDF or XML fine. also, I would 
> like to have well annotated content, this is simple and something that 
> could quite easily be part of existing publication workflows. it may 
> also be part of the guidelines for authors -for instance, identify and 
> annotate rhetorical structures.

Modulo any confusing typos in my earlier posts, I don't see where we are 
disagreeing :-)


Kingsley
>
> On Mon, Oct 6, 2014 at 11:03 AM, Kingsley Idehen 
> <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote:
>
>     On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
>
>         It's not hard to query PDFs with SPARQL.  All you have to do
>         is extract the metadata from the document and turn it into
>         RDF, if needed. Lots of programs extract and display this
>         metadata already.
>
>
>     Peter,
>
>     Having had 200+ (some-non-rdf-doc} to RDF document transformers
>     built under my direct guidance, there are issues with your claim
>     above:
>
>     1. The extractors are platform specific -- AWWW is about platform
>     agnosticism (I don't want to mandate an OS for experiencing the
>     power of Linked Open Data transformers / rdfizers)
>
>     2. It isn't solely about metadata  -- we also have raw data inside
>     these documents confined to Tables, paragraphs of sentences
>
>     3. If querying a PDF was marginally simple, I would be
>     demonstrating that using a SPARQL results URL in response to this
>     post :-)
>
>     Possible != Simple and Productive.
>
>     We want to leverage the productivity and simplicity that AWWW
>     brings to data representation, access, interaction, and integration.
>
>
>     -- 
>     Regards,
>
>     Kingsley Idehen
>     Founder & CEO
>     OpenLink Software
>     Company Web: http://www.openlinksw.com
>     Personal Weblog 1: http://kidehen.blogspot.com
>     Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
>     <http://www.openlinksw.com/blog/%7Ekidehen>
>     Twitter Profile: https://twitter.com/kidehen
>     Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
>     LinkedIn Profile: http://www.linkedin.com/in/kidehen
>     Personal WebID:
>     http://kingsley.idehen.net/dataspace/person/kidehen#this
>
>
>
>
>
> -- 
> Alexander Garcia
> http://www.alexandergarcia.name/
> http://www.usefilm.com/photographer/75943.html
> http://www.linkedin.com/in/alexgarciac
>


-- 
Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this


Received on Monday, 6 October 2014 19:15:11 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:49:25 UTC