Re: scientific publishing process (was Re: Cost and access)

On 10/06/2014 11:03 AM, Kingsley Idehen wrote:
> On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
>> It's not hard to query PDFs with SPARQL.  All you have to do is extract the
>> metadata from the document and turn it into RDF, if needed. Lots of programs
>> extract and display this metadata already.
>
> Peter,
>
> Having had 200+ (some-non-rdf-doc} to RDF document transformers built under my
> direct guidance, there are issues with your claim above:

Huh?  Every single PDF reader that I use can extract the PDF metadata and 
display it.  The metadata that I see in PDF documents uses a core set of 
properties that are easy to transform into RDF.  Of course, this core set is 
very small (title, author, and a few other things) so you don't get all that 
much out of the core set.

>
> 1. The extractors are platform specific -- AWWW is about platform agnosticism
> (I don't want to mandate an OS for experiencing the power of Linked Open Data
> transformers / rdfizers)

Well, the extractors would be specific to PDF, but that's hardly surprising, I 
think.

> 2. It isn't solely about metadata  -- we also have raw data inside these
> documents confined to Tables, paragraphs of sentences

Well, sure, but is extracting information directly from the figures or tables 
or text being considered here?  I sure would like this to be possible.  How 
would it work in an HTML context?

> 3. If querying a PDF was marginally simple, I would be demonstrating that
> using a SPARQL results URL in response to this post :-)

I'm not saying that it is so simple.  You do have to find the metadata block 
in the PDF and then look for the /Title, /Author, ... stuff.

> Possible != Simple and Productive.

Yes, but there are lots of tools that display PDF metadata, so there are some 
who believe that the benefit is greater than the cost.

> We want to leverage the productivity and simplicity that AWWW brings to data
> representation, access, interaction, and integration.

Sure, but the additional costs, if any, on paper authors, reviewers, and 
readers have to be considered.  If these costs are eliminated or at least 
minimized then this good is much more likely to be realized.

peter

Received on Monday, 6 October 2014 18:50:28 UTC