- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Tue, 07 Oct 2014 07:53:17 -0400
- To: public-lod@w3.org
- Message-ID: <5433D42D.4080408@openlinksw.com>
On 10/7/14 5:39 AM, Norman Gray wrote: > Kingsley and all, hello. > > On 2014 Oct 7, at 02:18, Kingsley Idehen <kidehen@openlinksw.com> wrote: > >> On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote: >>> >>> On 10/06/2014 11:03 AM, Kingsley Idehen wrote: >>>> On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote: >>>>> It's not hard to query PDFs with SPARQL. All you have to do is extract the >>> Huh? Every single PDF reader that I use can extract the PDF metadata and display it. >> Again, this isn't about metadata. > With all respect to the larger goal of having fully semanticked-up documents, I think the question _is_ all about metadata. It can't be. The metadata focus is a subtle misconception. We need access to all of the data in the document. > The original spark to the thread was a lament that SW and LD conferences don't mandate something XMLish for submissions because X(HT)ML is clearly better for... well ... dammit, it's Better. The initial gripe (as I've always seen it) is that we are trying to tell the world about Linked Open Data virtues while rarely putting them to use (instinctively) ourselves. It just so happens that conferences are provide an example that most have experienced in some capacity. > > _One_ thing it would be better for is supporting the sort of full-scale RDF-everything view that you've described so eloquently. But if that's your goal, then lexing the source text is really going to be the least of your problems. > > A more modest goal, which is still valuable and _much_ more achievable, is to get at least some RDF out of submitted articles. Yes, or just make references to RDF sources relevant to the paper, but on the basis that those references (to the degree possible) resolve. This also about the data represented in tabular form (as tables) and the data behind the tables, so to speak. > That practically means metadata, plus perhaps some document structure, plus, if you're keen and can get the authors to invest their effort, some argumentation. That's available for free (and right now) from LaTeX authors, and available from XHTML authors depending on how hard it would be to get them to put @profile attribute in the right places. > > So no, not just about 'metadata' in the narrow sense, but I think this thread is about what RDF you can in practice extract from the materials that authors can in practice be induced or obliged to submit to conference proceedings. For those conferences associated with themes such as Linked Open Data and the Semantic Web, RDF should be the norm for structured data representation. If that isn't possible then what are we saying to the world about RDF, in regards to structured data representation and data de-silo-fication? > > That original lament has overlapped with a parallel lament that PDF is a dead-end format -- it's not 'webby'. The are linked :-) > I believe that the demo in my earlier message undermines that claim as far as RDF goes. > >>>> 1. The extractors are platform specific -- AWWW is about platform agnosticism >>>> (I don't want to mandate an OS for experiencing the power of Linked Open Data >>>> transformers / rdfizers) >>> Well, the extractors would be specific to PDF, but that's hardly surprising, I think. > [I've lost track of whose comment this is...] > > The extractor I demoed wasn't PDF-specific. "Platform" in the context of my comments really relates to operating systems i.e., most PDF extractors are operating system specific. That's why I mentioned the massive opportunity for Adobe (and 3rd parties too, as Mike Bergman added) in regards to providing Web Services to accessing and indexing PDF document content. > >>>> We want to leverage the productivity and simplicity that AWWW brings to data >>>> representation, access, interaction, and integration. >>> Sure, but the additional costs, if any, on paper authors, reviewers, and readers have to be considered. If these costs are eliminated or at least minimized then this good is much more likely to be realized. >> With some help from Adobe we can have the best of all worlds here. I am going to take a look at their latest cloud offerings and associated APIs. > I forgot to attach the extractor I wrote -- done. The demo didn't use any Adobe API, neither to put the XMP into the PDF nor to extract the RDF from it. You forgot the extractor demo link :) > > All the best, > > Norman > > -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Tuesday, 7 October 2014 11:53:40 UTC