Re: scientific publishing process (was Re: Cost and access) from Norman Gray on 2014-10-07 (public-lod@w3.org from October 2014)

From: Norman Gray <norman@astro.gla.ac.uk>
Date: Tue, 7 Oct 2014 10:39:10 +0100
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: public-lod@w3.org
Message-Id: <EBDBB2CF-A2BC-4830-BACA-E509AE69DB13@astro.gla.ac.uk>

Kingsley and all, hello.

On 2014 Oct 7, at 02:18, Kingsley Idehen <kidehen@openlinksw.com> wrote:

> On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote:
>> 
>> 
>> On 10/06/2014 11:03 AM, Kingsley Idehen wrote:
>>> On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
>>>> It's not hard to query PDFs with SPARQL.  All you have to do is extract the
>> 
>> Huh?  Every single PDF reader that I use can extract the PDF metadata and display it.
> 
> Again, this isn't about metadata.

With all respect to the larger goal of having fully semanticked-up documents, I think the question _is_ all about metadata.  The original spark to the thread was a lament that SW and LD conferences don't mandate something XMLish for submissions because X(HT)ML is clearly better for... well ... dammit, it's Better.

_One_ thing it would be better for is supporting the sort of full-scale RDF-everything view that you've described so eloquently.  But if that's your goal, then lexing the source text is really going to be the least of your problems.

A more modest goal, which is still valuable and _much_ more achievable, is to get at least some RDF out of submitted articles.  That practically means metadata, plus perhaps some document structure, plus, if you're keen and can get the authors to invest their effort, some argumentation.  That's available for free (and right now) from LaTeX authors, and available from XHTML authors depending on how hard it would be to get them to put @profile attribute in the right places.

So no, not just about 'metadata' in the narrow sense, but I think this thread is about what RDF you can in practice extract from the materials that authors can in practice be induced or obliged to submit to conference proceedings.

That original lament has overlapped with a parallel lament that PDF is a dead-end format -- it's not 'webby'.  I believe that the demo in my earlier message undermines that claim as far as RDF goes.

>>> 1. The extractors are platform specific -- AWWW is about platform agnosticism
>>> (I don't want to mandate an OS for experiencing the power of Linked Open Data
>>> transformers / rdfizers)
>> 
>> Well, the extractors would be specific to PDF, but that's hardly surprising, I think.

[I've lost track of whose comment this is...]

The extractor I demoed wasn't PDF-specific.

>>> We want to leverage the productivity and simplicity that AWWW brings to data
>>> representation, access, interaction, and integration.
>> 
>> Sure, but the additional costs, if any, on paper authors, reviewers, and readers have to be considered.  If these costs are eliminated or at least minimized then this good is much more likely to be realized.
> 
> With some help from Adobe we can have the best of all worlds here. I am going to take a look at their latest cloud offerings and associated APIs.

I forgot to attach the extractor I wrote -- done.  The demo didn't use any Adobe API, neither to put the XMP into the PDF nor to extract the RDF from it.

All the best,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

Attachments

application/octet-stream attachment: extract-xmp.c

Received on Tuesday, 7 October 2014 09:39:27 UTC