Re: scientific publishing process (was Re: Cost and access) from Kingsley Idehen on 2014-10-07 (public-lod@w3.org from October 2014)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 07 Oct 2014 07:53:17 -0400
To: public-lod@w3.org
Message-ID: <5433D42D.4080408@openlinksw.com>
On 10/7/14 5:39 AM, Norman Gray wrote:
> Kingsley and all, hello.
>
> On 2014 Oct 7, at 02:18, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>
>> On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote:
>>>
>>> On 10/06/2014 11:03 AM, Kingsley Idehen wrote:
>>>> On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
>>>>> It's not hard to query PDFs with SPARQL.  All you have to do is extract the
>>> Huh?  Every single PDF reader that I use can extract the PDF metadata and display it.
>> Again, this isn't about metadata.
> With all respect to the larger goal of having fully semanticked-up documents, I think the question _is_ all about metadata.

It can't be. The metadata focus is a subtle misconception. We need 
access to all of the data in the document.

>    The original spark to the thread was a lament that SW and LD conferences don't mandate something XMLish for submissions because X(HT)ML is clearly better for... well ... dammit, it's Better.

The initial gripe (as I've always seen it) is that we are trying to tell 
the world about Linked Open Data virtues while rarely putting them to 
use (instinctively) ourselves. It just so happens that conferences are 
provide an example that most have experienced in some capacity.

>
> _One_ thing it would be better for is supporting the sort of full-scale RDF-everything view that you've described so eloquently.  But if that's your goal, then lexing the source text is really going to be the least of your problems.
>
> A more modest goal, which is still valuable and _much_ more achievable, is to get at least some RDF out of submitted articles.

Yes, or just make references to RDF sources relevant to the paper, but 
on the basis that those references (to the degree possible) resolve. 
This also about the data represented in tabular form (as tables) and the 
data behind the tables, so to speak.

>   That practically means metadata, plus perhaps some document structure, plus, if you're keen and can get the authors to invest their effort, some argumentation.  That's available for free (and right now) from LaTeX authors, and available from XHTML authors depending on how hard it would be to get them to put @profile attribute in the right places.
>
> So no, not just about 'metadata' in the narrow sense, but I think this thread is about what RDF you can in practice extract from the materials that authors can in practice be induced or obliged to submit to conference proceedings.

For those conferences associated with themes such as Linked Open Data 
and the Semantic Web, RDF should be the norm for structured data 
representation. If that isn't possible then what are we saying to the 
world about RDF, in regards to structured data representation and data 
de-silo-fication?


>
> That original lament has overlapped with a parallel lament that PDF is a dead-end format -- it's not 'webby'.

The are linked :-)

>    I believe that the demo in my earlier message undermines that claim as far as RDF goes.
>
>>>> 1. The extractors are platform specific -- AWWW is about platform agnosticism
>>>> (I don't want to mandate an OS for experiencing the power of Linked Open Data
>>>> transformers / rdfizers)
>>> Well, the extractors would be specific to PDF, but that's hardly surprising, I think.
> [I've lost track of whose comment this is...]
>
> The extractor I demoed wasn't PDF-specific.

"Platform" in the context of my comments really relates to operating 
systems i.e., most PDF extractors are operating system specific. That's 
why I mentioned the massive opportunity for Adobe (and 3rd parties too, 
as Mike Bergman added) in regards to providing Web Services to accessing 
and indexing PDF document content.

>
>>>> We want to leverage the productivity and simplicity that AWWW brings to data
>>>> representation, access, interaction, and integration.
>>> Sure, but the additional costs, if any, on paper authors, reviewers, and readers have to be considered.  If these costs are eliminated or at least minimized then this good is much more likely to be realized.
>> With some help from Adobe we can have the best of all worlds here. I am going to take a look at their latest cloud offerings and associated APIs.
> I forgot to attach the extractor I wrote -- done.  The demo didn't use any Adobe API, neither to put the XMP into the PDF nor to extract the RDF from it.

You forgot the extractor demo link :)

>
> All the best,
>
> Norman
>
>


-- 
Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Tuesday, 7 October 2014 11:53:40 UTC