W3C home > Mailing lists > Public > public-scholarlyhtml@w3.org > September 2017

Re: html for scholarly communication: RASH, Scholarly HTML or Dokieli?

From: Johannes Wilm <mail@johanneswilm.org>
Date: Wed, 6 Sep 2017 22:25:57 +0200
Message-ID: <CABkgm-RjdDR-GetAEZuN0vYRbD+f=NQaSeC076=xpWc=NHHwJg@mail.gmail.com>
To: Robin Berjon <robin@berjon.com>
Cc: Peter Murray-Rust <pm286@cam.ac.uk>, W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
On Wed, Sep 6, 2017 at 9:32 PM, Robin Berjon <robin@berjon.com> wrote:

> On 06/09/2017 10:32 , Johannes Wilm wrote:
> > So does this mean that Scholarly HTML effectively is no longer existent?
> > Or do Tzviya, Ivan, Robin and/or others plan on continuing with this? If
> > yes, is the idea that this could eventually replace RASH, Dokieli, etc.
> > or is there another goal with Scholarly HTMl than the other ones?
> SH still exists as much as it ever did. The science.ai platform uses it
> throughout.
> The goal behind SH is simple: bring interoperability to scientific
> content (primarily articles at this stage) by doing nothing more than
> using the platform as it exists. One driving concern is that if you show
> it to someone who just knows the basic Web platform but who knows little
> about the specifics of scholarly publishing it should "make sense" to
> them. It should also just work with the existing infrastructure that you
> can expect Web people to be familiar with and to use.
> So it really is "just" HTML+schema.org used sensibly. There is, as
> always, a certain amount of arbitrariness (eg. choosing between RDFa and
> Microdata is not a clear decision) that requires a specification for
> people to interoperate from, but it really isn't much.
> One aspect that may come across as counter-intuitive given the above is
> that it is not, however, meant to be an authoring format.

You have mentioned this before, but I am not sure what consequence the
distinction should have. In the internals of our editor, we use a structure
of objects and arrays (json) we need this while the document is opened to
order communications between clients simultaneously connected to the same
document. This format we change between releases if needed without having
do discuss it with anyone as we always know that everyone looking at our
app simultaneously will be running the same JS of our app in their browser.
For storing to the database, we initially stored things in HTML, but for
the past 4 years or so we have been using the same json we use internally.
So this format we don't plan on standardizing with others unless someone
presents a good reason why we need to do realtime collaboration between
different editors. Is this what you refer to when you say "authoring

What we are interested in is the file format that the user gets inside a
zip file together with images and other resources when clicking a "Export
as HTML" button. That is what you call an interchange format, right?

> It is designed
> to be simple for interchange; the constraints upon being simple for
> humans to produce are different.

Or are you talking about a format for people to write in a texteditor by
hand? We are fully WYSIWYM and the user doesn't ever get to handle the
"source version" of the text, so maybe that's why I don't get the point of
this distinction.

> Focusing on interchange means less
> optionality and more verbosity — which makes it easier and more reliable
> to process.

Right. If you only allow one way of doing any one thing, that should make
it easier to write converters and other tools. However, if RASH is able to
put the time into writing such tools even though they allow users to
specify everything in 3 different ways, and Scholarly HTML is not able to
do that, then the argument seems a bit void.

> I see another advantage: people can compete on authoring
> solutions, which I think is good because it is insane to believe that
> everyone will agree on what makes a good authoring format. Hell, some
> people even like LaTeX ;-)

Agreed. Although the format will have to have some complexity if it is to
cover all subject areas. If you want to write a 1000 page monograph in the
humanities authored by just one person with footnote or in-text citations,
you'll need some to collect other type of metadata than for a field of
computer science where papers are around 1000 words long.

> In all seriousness, I see DS3
> (http://docx.science.ai/) as an authoring format for SH, for people who
> think that Word is cool.

This looks similar to the DOCX import filter that the RASH people provide.
But if I understand the license correctly, the RASH filter does allow
commercial usage, whereas your does not, right?

> SA plans to release some human-oriented documentation about (our
> implementation of) SH (ie. not a spec) relatively soon, more like a
> friendly update to http://scholarly.vernacular.io/. It won't form a good
> basis for interoperable processors but it should explain all the
> concepts that we use. We can't, at this stage, commit the time to also
> produce a spec-like document but we remain fully supportive of it. Given
> the Vernacular document, the current spec, and that draft I think
> someone interested would have all the basic bits required to put a newer
> spec together (assuming, of course, that that person and the group agree
> that the decisions made there are sensible!).
> This group here has:
>   • A clear a strong level of interest;
>   • An overall good idea of what's required (a Web-based replacement for
> JATS);
>   • The right technical knowledge;
>   • The right community to provide feedback;
>   • All the infrastructure.
> The only thing that's missing here is drive. Anyone with an hour or two
> a week to dedicate to the cause would have this wrapped by the end of
> the year. I'm sorry that person can't be me right now; I hoped I would
> have more time but the life of a small startup is what it is!

I can understand your priorities. Running a startup is extremely tough.

But does this then not leave us in the situation that RASH is being
actively maintained and tools are written for it, whereas Scholarly HTML is

I guess RASH is more tied to specific tools, and from the looks of it, the
format is not governed by any formal decision making process, so it's
basically up to the development team behind it? I mean I understand,. Our
Fidus Writer format is also just what we decide to put into it. But I
wouldn't expect anyone else to adopt it either.

If my suspicion is correct, it sounds like the main difference is that in
RASH, several different ways of doing the same are allowed, whereas in
Scholarly HTML, just one way is allowed. (I am leaving Dokieli out of this,
as it sounds like the purpose is somewhat different). If the tools exist
for RASH but not for Scholarly HTML, could we then not simply choose one of
the various ways to express things in RASH and use that sub format for
interchange? Something like "Strict RASH". And would it not be possible to
continue the development of that under some kind of community (if that is
not the case yet), so that others can have a stake in it as well?

 I wouldn't mind helping out a little bit with this, but not if it means
duplicating an effort others are doing already for no good reason.

> > For us priorities are to follow a standard that does as much as possible
> > follow good practices in terms of standardization (having a formal way
> > to influence the process, open discussions, a decision making process,
> > etc.), but secondly also to work on a format that has a future of some
> > kind because someone else is using it or at least planning on using it
> > in the future.
> That's the plan!
> --
> • Robin Berjon
> • http://berjon.com/
> • @robinberjon

Johannes Wilm
tel: +1 (520) 399 8880
Received on Wednesday, 6 September 2017 20:26:27 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:13:01 UTC