Re: Publication of scientific research from Sarven Capadisli on 2013-04-25 (public-lod@w3.org from April 2013)

From: Sarven Capadisli <info@csarven.ca>
Date: Thu, 25 Apr 2013 15:48:52 +0200
To: public-lod@w3.org
Message-ID: <51793444.1020603@csarven.ca>
On 04/25/2013 08:05 AM, Ivan Herman wrote:
> Just an additional 2 cents (hm, should be 2 pennies, because I am
> currently in London)
>
> First of all, I am in the same committee as Daniel (IW3C2) so I can
> only agree with every word he said. The fact of the matter is that
> HTML authoring tools are still extremely poor (regardless of
> publication) and, I am a bit afraid that the situation will not
> improve because most of the web content is created through systems
> like Wordpress or Drupal, ie, not by directly creating HTML. Which
> means that the economic incentives to have a really user friendly
> authoring tool for HTML are moderate. There are only a handful of
> tools (eg, BlueGriffon) and some tools are very expensive.
>
> As for the metadata: I think even turtle is too complicated for many
> (sorry Kingsley). I am not talking about the average readers of this
> list; I am talking about authors in other disciplines. But, if we
> bite the bullet and we say that papers are submitted in PDF, we could
> at least require to include the metadata in the PDF file. After all,
> the metadata is included in PDF in XMP format, which is (a slightly
> ugly and restricted version of) RDF/XML. It is ugly, but we have
> enough tools around to turn it into Turtle, or JSON-LD, or whatever.
>
> The XMP content can be extracted easily; I have had, for ages, a
> small Python program and a service doing that[1], and I know there
> are similar tools in Java. (And, if you look at the Python file[2],
> it is easy.) The only stumbling block is how easy is it to get the
> XMP into the file. AFAIK, if you start with Word (do not lough, lot
> of people do that:-), the information is converted into PDF info, ie,
> XMP; I do not know what happens with the LaTeX production pipeline.
>
> Not ideal. _Very_ far from it. In my ideal word publishing should
> happen in HTML, with metadata included in some syntax (RDFa,
> microdata, embedded turtle). And we should, collectively, make better
> tools for this. Until that happens, the scientific community will
> have difficulties moving.
>
> Ivan
>
> P.S. I wonder whether moving collectively to EPUB would not make more
> sense. EPUB is a packaged HTML5 site which may include, as a package,
> all the images; it may also include metadata. Alas!, the authoring
> issue is just as bad, so it is difficult to imagine that being an
> alternative _today_. But it may become one in future.

I'm concerned with the final output i.e., (X)HTML+RDFa because:

* Content is readable like an "ordinary" webpage
* Machine readable (and all the benefits from Linked Data)
* With appropriate print stylesheets, it can be printed to paper or PDF

People are going to use whatever tool or format they feel comfortable 
with to get the data in. This is difficult to control that in my 
opinion. However, we can control the output by way of transformations if 
necessary.

As I questioned in another thread; are we sure that it is the tooling 
that's a blocker? Are we still stuck at busting HTML output in a 
friendly manner? How difficult is this task for the SW/LD authors? For 
plain text editor users, is a change from /paragraph to <p> too much to 
ask? Or is it the majority's needs e.g., WYSIWYG editor? Is it the HTML 
publishing platform that's lacking?

I'm not trying to pigeon-hole abilities here. But trying to know the 
real difficulties as reported. If we don't know for sure, then a survey 
is in order which captures today's challenges.

A strong minus one from me with regards to hacking around PDF to move 
things forward.

-Sarven
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Thursday, 25 April 2013 13:49:25 UTC