Re: scientific publishing process (was Re: Cost and access)

On 2014-10-04 04:14, Daniel Schwabe wrote:
> As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before.
> Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions.
> Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since.

Hi Daniel, here is my long reply as usual and I hope you'll give it a 
shot :)

I've offered *a* solution that is compatible with the existing workflow 
without asking for any extra work from the OC/PCs, with the exception 
that the Web-native technologies for the submissions are officially 
encouraged. They will get their PDF in the end to cater the existing 
pipeline. In the meantime, the community retains higher quality research 
documents.

> And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily automated ones (e.g., links to authors, references to papers, etc..). Producing good hypertext, and consuming it, is much harder than writing plain text. And most authors are not trained in producing this kind of content. Making this actually "semantic" in some sense is still, in my view, a research topic, not a routine reality.
> Until we have robust tools that make it as easy for authors to write papers with the advantages afforded by PDF, without its shortcomings, I do not see this changing.

I disagree that we don't have sufficient or robust tools to author and 
publish "web pages". I find it ironic that we are still debating on this 
issue as if we are in the early-mid 90s. Or ignoring [2], or the 
possibility to use a service which offers [3] to publish a (pardon me 
for saying) but a friggin' web page.

If it is about "coding", I find it unreasonable or unprofessional to 
think that a Computer/Web Scientist in 2014 that's publicly funded to do 
their academic endeavors is incapable of groking HTML. But, somehow 
LaTeX is presumed to be okay for the new post-graduate that's coming in. 
Really? Or is the real reason that no one is asking them to do otherwise?

They can randomly pick a WYSIWYG editor tool or an existing publishing 
service. No one is forcing anyone to hand-code anything. Just as no one 
is forced to hand code LaTeX.

We have the tools and even services to help us do all of that. Both from 
and outside of SW. We had them for a long time. What was lacking was a 
continuous green light to use them. That light stopped flashing as 
you've mentioned.

But again, our core problems are not technical in nature.

> I would love to see experiments (e.g., certain workshops) to try it out before making this a requirement for whole conferences.

I disagree. The fact that workshops or tracks on "linked science" or 
"semantic publishing" didn't deliver is a clear sign that they have the 
wrong process at the root. When those workshops ask for submissions to 
be in PDF, that's the definition of irony. There is no "useful" 
machine-friendly research objects! Opportunity lost at every single CfP.

Yet, we eloquently describe hypothetical systems or tools that will "one 
day" do all the magic for us instead of taking a good look at what's 
right in front of us.

So, lets talk about putting the cart before the horse. A lot of time and 
energy (e.g., public funding) that could have been better used simply by 
actually *having the data*. And, then figuring out how to utilize that. 
There is no data, so what's there to analyze or learn from? Some 
research trying to figure out what to do with trivial and limited 
metadata e.g., title, abstract, authors, subjects? Is 
data.semanticweb.org ("dog food") the best we can show for our 
"dogfooding" ability?

I can't search/query for research knowledge on topic T, that used 
variables X, Y, which implemented a workflow step S, that's cited by or 
used those exact parameters, that happens to use the datasets that I'm 
planning to use in my research.

Reproducibility: 0
Comparability: 0
Discovery: 0
Reuse: 0
H-Index: +1?

> Bernadette's suggestions are a good step in this direction, although I suspect it is going to be harder than it looks (again, I'd love to be proven wrong ;-)).

Nothing is stopping us from doing things in parallel and we are in fact. 
Close-by efforts from workshops to force11, public-dwbp-wg, 
public-digipub-ig, .. to recommendations e.g., PROV-O, OPMW, SIO, SPAR, 
besides the whole SW/LD stack, which benefits scientific research 
communication and advancement.

The fundamental question is, if we have all of that going on, why are we 
not taking the *minimal* step to put it to use where it matters most? If 
the answer depends on making it comfortable and rewarding for the very 
few, then I disagree on our priorities.

So, it is *especially difficult* when conferences or journals about 
(Semantic) "Web" using WWW, "Semantic Web", "Linked Something" in their 
title do not encourage their own technologies towards communicating 
research output.

Net result: we continued on making it difficult to mine our own information.

When conference and supervisors do not encourage the SW researcher to 
eat their own dogfood but use something archaic for knowledge sharing on 
the Web, well, is it any surprise that we are not going faster than we 
could, and that I can't do my silly queries to find papers or use 
previously declared/discussed variables in my research?

We can speed up Web Science, attract or create funding opportunities 
simply by having a better understanding of our own data. What good is it 
exactly that the print output can be in high quality and that it has an 
arbitrary length or fixed view? Oh right, that's where the publisher 
comes in.

No sensible data, no fun.

Again, I think something along the lines of:

http://csarven.ca/call-for-linked-research
https://github.com/csarven/linked-research

is "good enough" to proceed for those that wish to cover both cases 
(retaining semantics and complying with conference/publisher 
requirements). We don't have to wait it out and see how the next best 
thing comes along (e.g., like I said, the workshops on SW/LD scientific 
publishing are not even doing it) We can figure that out as we go.

If you have read this far, thank you! :)

[1] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0325.html
[2] http://en.wikipedia.org/wiki/Comparison_of_HTML_editors
[3] http://en.wikipedia.org/wiki/List_of_content_management_systems

-Sarven
http://csarven.ca/#i

Received on Saturday, 4 October 2014 09:22:08 UTC