Re: Publication of scientific research

Just an additional 2 cents (hm, should be 2 pennies, because I am currently in London)

First of all, I am in the same committee as Daniel (IW3C2) so I can only agree with every word he said. The fact of the matter is that HTML authoring tools are still extremely poor (regardless of publication) and, I am a bit afraid that the situation will not improve because most of the web content is created through systems like Wordpress or Drupal, ie, not by directly creating HTML. Which means that the economic incentives to have a really user friendly authoring tool for HTML are moderate. There are only a handful of tools (eg, BlueGriffon) and some tools are very expensive.

As for the metadata: I think even turtle is too complicated for many (sorry Kingsley). I am not talking about the average readers of this list; I am talking about authors in other disciplines. But, if we bite the bullet and we say that papers are submitted in PDF, we could at least require to include the metadata in the PDF file. After all, the metadata is included in PDF in XMP format, which is (a slightly ugly and restricted version of) RDF/XML. It is ugly, but we have enough tools around to turn it into Turtle, or JSON-LD, or whatever.

The XMP content can be extracted easily; I have had, for ages, a small Python program and a service doing that[1], and I know there are similar tools in Java. (And, if you look at the Python file[2], it is easy.) The only stumbling block is how easy is it to get the XMP into the file. AFAIK, if you start with Word (do not lough, lot of people do that:-), the information is converted into PDF info, ie, XMP; I do not know what happens with the LaTeX production pipeline.

Not ideal. _Very_ far from it. In my ideal word publishing should happen in HTML, with metadata included in some syntax (RDFa, microdata, embedded turtle). And we should, collectively, make better tools for this. Until that happens, the scientific community will have difficulties moving.

Ivan

P.S. I wonder whether moving collectively to EPUB would not make more sense. EPUB is a packaged HTML5 site which may include, as a package, all the images; it may also include metadata. Alas!, the authoring issue is just as bad, so it is difficult to imagine that being an alternative _today_. But it may become one in future.



[1] http://ivan-herman.name/xmp-extraction-service/
[2] http://dev.w3.org/cvsweb/~checkout~/2004/PythonLib-IH/xmp.py



On Apr 24, 2013, at 21:20 , Kingsley Idehen <kidehen@openlinksw.com> wrote:

> On 4/24/13 3:39 PM, Daniel Schwabe wrote:
>> On the other hand, efforts continue to at least provide metadata in RDF, which has been surprisingly harder to produce year after year without requiring hand coding and customization each time. But we will get there, I hope.
>> Just my 2c...
>> 
> Very important 2 cents :-) 
> 
> Fear of crafting RDF by hand is the root of many problems. A long time ago, it was a major problem thanks to  RDF/XML. Today, thanks to Turtle, writing RDF by hand is short cut to demonstrating the fundamental benefits of RDF based Linked Data. 
> 
> Scribbling [1] is a pattern that's vital to the Web in general, sadly, you can't really scribble HTML and you absolutely couldn't scribble RDF/XML. Another important issue, somewhat forgotten, is the file create, save, and share pattern. 
> 
> In my experience, with our use of RDF based Linked Data in dog-food manner [2][3], the combination of crafting Turtle based RDF documents by hand and the file create, save, and publish pattern works wonders for streamlining data publication, especially when you also factor in ACLs based on URIs that denote Agents (e.g., WebID) . 
> 
> To conclude, it is a new day and a new time for RDF. The problems the RDF/XML introduced should really be behind us at this stage. Unfortunately, many still conflate RDF/XML and RDF and in the process end up dismissing its utility in general. 
> 
> PDFs should at the very least be accompanied with RDF based metadata with regards to scientific publications and event papers etc.. 
> 
> 
> Links:
> 
> 1. http://bit.ly/Zgw83a -- circa. 2007 presentation where TimBL talks about scribbling pattern 
> 2. http://virtuoso.openlinksw.com/data/turtle/ -- collection of Turtle documents 
> 3. http://kingsley.idehen.net/about/html/http/virtuoso.openlinksw.com/data/turtle/Virtuoso7Offers.ttl -- HTML based view of the RDF descriptions from one of the Turtle docs
> 4. http://bit.ly/RJzd9S -- old post about why Turtle is so important to RDF based Linked Data (we need a simple notation that covers programmer and non programmer profiles i.e., scribblers) . 
> 
> -- 
> 
> Regards,
> 
> Kingsley Idehen	      
> Founder & CEO 
> OpenLink Software     
> Company Web: 
> http://www.openlinksw.com
> 
> Personal Weblog: 
> http://www.openlinksw.com/blog/~kidehen
> 
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile: 
> https://plus.google.com/112399767740508618350/about
> 
> LinkedIn Profile: 
> http://www.linkedin.com/in/kidehen
> 
> 
> 
> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Thursday, 25 April 2013 06:05:49 UTC