Re: scientific publishing process (was Re: Cost and access) from Hugh Glaser on 2014-10-04 (semantic-web@w3.org from October 2014)

From: Hugh Glaser <hugh@glasers.org>
Date: Sat, 4 Oct 2014 12:14:05 +0100
To: Daniel Schwabe <dschwabe@inf.puc-rio.br>
Cc: SW-forum Web <semantic-web@w3.org>, Linking Open Data <public-lod@w3.org>, Phillip Lord <phillip.lord@newcastle.ac.uk>, Eric Prud'hommeaux <eric@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Bernadette Hyland <bhyland@3roundstones.com>
Message-Id: <1E33FBED-1C64-4060-8F31-A5E15009D5D1@glasers.org>
Executive summary:
1) Bring up an ePrints repository for “our” conferences, and a myExperiment instance, or equivalents;
2) Start to contribute to the Open Source community.

Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone.

Longer version.
I too have a deep sense of deja vu all over yet again :-)

But I have learned something - on-one seems to collaborate with people outside the tecchy world.
Most documents for me start as a (set of) collaborative Google Doc (unmentioned) or a Word or OpenOffice document (not mentioned much) on Dropbox.
And the collaborators couldn’t possibly help me build a Latex document or even any interesting HTML.

Anyway…
I see quite a few different things in this discussion, and all of them deeply important for the research publishing world at the moment.
a) Document format;
b) Metadata about the publication, both superficial and deep;
c) Data, systems and workflow about the research.

But starting almost everything from scratch (the existing standards and a few tools) is rarely the way to go in this webby world.

There is real stuff out there (as I have said more than once before), that could really benefit from the sort of activity that Bernadette describes.
I know about a number of things, but there will be others.

(a) and (b) Repositories (because that is what we are talking about)
http://eprints.org is an Open Source Linked Data publishing platform for publications that handles the document (in any format) and the shallow metadata, but could easily have deep as well if people generated it.
Eg http://eprints.soton.ac.uk/id/eprint/271458
I even have an existing endpoint with all the ePrints RDF in it - http://foreign.rkbexplorer.com, with currently 24G & 182854666 triples, so such software can be used.

What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series?
And require the authors to enter their data into the site - it’s not hard, and there is existing documentation of what to do.
It is mature technology with 100s of person-years invested.
And perhaps most importantly, it has the buy in of the library and similar communities, and has been field tested with users.
It would certainly be more maintainable than the DogFood site - and it would be a trivialish task to move the great DogFood efforts over to it. DogFood really is something of a silo - exactly what Linked Data is meant to avoid.
And “we” might actually contribute to the wider community by enhancing the Open Source Project with Linked Data enhancements that were useful out there!
Or a more challenging thing would be to make http://www.dspace.org do what we want (https://wiki.duraspace.org/display/DSPACE/Linked+Open+Data+for+DSpace)!

(c) Workflows and Datasets
I have mentioned http://www.myexperiment.org before, but can’t remember if I have mentioned http://www.wf4ever-project.org
Again, these are Linked Data platforms for publishing; in this case workflows and datasets etc.
They are seriously mature, certainly compared with what we might build - see, for example https://github.com/wf4ever/ro
And exactly the same as the Repositories.

What would be wrong with bringing up such a repository for SemWeb/Web conferences, one for all, or for each or series?
…ditto…
Who know, maybe the Crawl, as well as the Challenge entries might be able to usefully describe what they did using these ontologies etc.?

Please, please, let’s not build anything ourselves - if we are to do anything, then let’s choose and join suitable existing activity and make it better for everyone.

Hugh

> On 4 Oct 2014, at 03:14, Daniel Schwabe <dschwabe@inf.puc-rio.br> wrote:
> 
> As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before.
> Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions.
> Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since.
> And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily automated ones (e.g., links to authors, references to papers, etc..). Producing good hypertext, and consuming it, is much harder than writing plain text. And most authors are not trained in producing this kind of content. Making this actually "semantic" in some sense is still, in my view, a research topic, not a routine reality.
> Until we have robust tools that make it as easy for authors to write papers with the advantages afforded by PDF, without its shortcomings, I do not see this changing.
> I would love to see experiments (e.g., certain workshops) to try it out before making this a requirement for whole conferences.
> Bernadette's suggestions are a good step in this direction, although I suspect it is going to be harder than it looks (again, I'd love to be proven wrong ;-)).
> Just my personal 2c
> Daniel
> 
> 
> On Oct 3, 2014, at 12:50  - 03/10/14, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
>> In my opinion PDF is currently the clear winner over HTML in both the ability to produce readable documents and the ability to display readable documents in the way that the author wants them to display.  In the past I have tried various means to produce good-looking HTML and I've always gone back to a setup that produces PDF.  If a document is available in both HTML and PDF I almost always choose to view it in PDF.  This is the case even though I have particular preferences in how I view documents.
>> 
>> If someone wants to change the format of conference submissions, then they are going to have to cater to the preferences of authors, like me, and reviewers, like me.  If someone wants to change the format of conference papers, then they are going to have to cater to the preferences of authors, like me, attendees, like me, and readers, like me.
>> 
>> I'm all for *better* methods for preparing, submitting, reviewing, and publishing conference (and journal) papers.  So go ahead, create one.  But just saying that HTML is better than PDF in some dimension, even if it were true, doesn't mean that HTML is better than PDF for this purpose.
>> 
>> So I would say that the semantic web community is saying that there are better formats and tools for creating, reviewing, and publishing scientific papers than HTML and tools that create and view HTML.  If there weren't these better ways then an HTML-based solution might be tenable, but why use a worse solution when a better one is available?
>> 
>> peter
>> 
>> 
>> 
>> 
>> 
>> On 10/03/2014 08:02 AM, Phillip Lord wrote:
>> [...]
>>> 
>>> As it stands, the only statement that the semantic web community are
>>> making is that web formats are too poor for scientific usage.
>> [...]
>>> 
>>> Phil
>>> 
> 
> Daniel Schwabe                      Dept. de Informatica, PUC-Rio
> Tel:+55-21-3527 1500 r. 4356        R. M. de S. Vicente, 225
> Fax: +55-21-3527 1530               Rio de Janeiro, RJ 22453-900, Brasil
> http://www.inf.puc-rio.br/~dschwabe
> 
> 
> 
> 
> 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Received on Saturday, 4 October 2014 11:14:32 UTC