Re: scientific publishing process (was Re: Cost and access) from Sarven Capadisli on 2014-10-04 (semantic-web@w3.org from October 2014)

From: Sarven Capadisli <info@csarven.ca>
Date: Sat, 04 Oct 2014 11:21:26 +0200
To: Daniel Schwabe <dschwabe@inf.puc-rio.br>, SW-forum Web <semantic-web@w3.org>, Linking Open Data <public-lod@w3.org>
CC: Phillip Lord <phillip.lord@newcastle.ac.uk>, Eric Prud'hommeaux <eric@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Bernadette Hyland <bhyland@3roundstones.com>, Fabien Gordon <Fabien.Gandon@inria.fr>
Message-ID: <542FBC16.9080709@csarven.ca>

On 2014-10-04 04:14, Daniel Schwabe wrote:
> As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before.
> Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions.
> Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since.

Hi Daniel, here is my long reply as usual and I hope you'll give it a
shot :)

I've offered *a* solution that is compatible with the existing workflow
without asking for any extra work from the OC/PCs, with the exception
that the Web-native technologies for the submissions are officially
encouraged. They will get their PDF in the end to cater the existing
pipeline. In the meantime, the community retains higher quality research
documents.

> And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily automated ones (e.g., links to authors, references to papers, etc..). Producing good hypertext, and consuming it, is much harder than writing plain text. And most authors are not trained in producing this kind of content. Making this actually "semantic" in some sense is still, in my view, a research topic, not a routine reality.
> Until we have robust tools that make it as easy for authors to write papers with the advantages afforded by PDF, without its shortcomings, I do not see this changing.

I disagree that we don't have sufficient or robust tools to author and
publish "web pages". I find it ironic that we are still debating on this
issue as if we are in the early-mid 90s. Or ignoring [2], or the
possibility to use a service which offers [3] to publish a (pardon me
for saying) but a friggin' web page.

If it is about "coding", I find it unreasonable or unprofessional to
think that a Computer/Web Scientist in 2014 that's publicly funded to do
their academic endeavors is incapable of groking HTML. But, somehow
LaTeX is presumed to be okay for the new post-graduate that's coming in.
Really? Or is the real reason that no one is asking them to do otherwise?

They can randomly pick a WYSIWYG editor tool or an existing publishing
service. No one is forcing anyone to hand-code anything. Just as no one
is forced to hand code LaTeX.

We have the tools and even services to help us do all of that. Both from
and outside of SW. We had them for a long time. What was lacking was a
continuous green light to use them. That light stopped flashing as
you've mentioned.

But again, our core problems are not technical in nature.

> I would love to see experiments (e.g., certain workshops) to try it out before making this a requirement for whole conferences.

I disagree. The fact that workshops or tracks on "linked science" or
"semantic publishing" didn't deliver is a clear sign that they have the
wrong process at the root. When those workshops ask for submissions to
be in PDF, that's the definition of irony. There is no "useful"
machine-friendly research objects! Opportunity lost at every single CfP.

Yet, we eloquently describe hypothetical systems or tools that will "one
day" do all the magic for us instead of taking a good look at what's
right in front of us.

So, lets talk about putting the cart before the horse. A lot of time and
energy (e.g., public funding) that could have been better used simply by
actually *having the data*. And, then figuring out how to utilize that.
There is no data, so what's there to analyze or learn from? Some
research trying to figure out what to do with trivial and limited
metadata e.g., title, abstract, authors, subjects? Is
data.semanticweb.org ("dog food") the best we can show for our
"dogfooding" ability?

I can't search/query for research knowledge on topic T, that used
variables X, Y, which implemented a workflow step S, that's cited by or
used those exact parameters, that happens to use the datasets that I'm
planning to use in my research.

Reproducibility: 0
Comparability: 0
Discovery: 0
Reuse: 0
H-Index: +1?

> Bernadette's suggestions are a good step in this direction, although I suspect it is going to be harder than it looks (again, I'd love to be proven wrong ;-)).

Nothing is stopping us from doing things in parallel and we are in fact.
Close-by efforts from workshops to force11, public-dwbp-wg,
public-digipub-ig, .. to recommendations e.g., PROV-O, OPMW, SIO, SPAR,
besides the whole SW/LD stack, which benefits scientific research
communication and advancement.

The fundamental question is, if we have all of that going on, why are we
not taking the *minimal* step to put it to use where it matters most? If
the answer depends on making it comfortable and rewarding for the very
few, then I disagree on our priorities.

So, it is *especially difficult* when conferences or journals about
(Semantic) "Web" using WWW, "Semantic Web", "Linked Something" in their
title do not encourage their own technologies towards communicating
research output.

Net result: we continued on making it difficult to mine our own information.

When conference and supervisors do not encourage the SW researcher to
eat their own dogfood but use something archaic for knowledge sharing on
the Web, well, is it any surprise that we are not going faster than we
could, and that I can't do my silly queries to find papers or use
previously declared/discussed variables in my research?

We can speed up Web Science, attract or create funding opportunities
simply by having a better understanding of our own data. What good is it
exactly that the print output can be in high quality and that it has an
arbitrary length or fixed view? Oh right, that's where the publisher
comes in.

No sensible data, no fun.

Again, I think something along the lines of:

http://csarven.ca/call-for-linked-research
https://github.com/csarven/linked-research

is "good enough" to proceed for those that wish to cover both cases
(retaining semantics and complying with conference/publisher
requirements). We don't have to wait it out and see how the next best
thing comes along (e.g., like I said, the workshops on SW/LD scientific
publishing are not even doing it) We can figure that out as we go.

If you have read this far, thank you! :)

[1] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0325.html
[2] http://en.wikipedia.org/wiki/Comparison_of_HTML_editors
[3] http://en.wikipedia.org/wiki/List_of_content_management_systems

-Sarven
http://csarven.ca/#i

Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature

Received on Saturday, 4 October 2014 09:22:02 UTC