Re: html for scholarly communication: RASH, Scholarly HTML or Dokieli? from Peter Murray-Rust on 2017-09-10 (public-scholarlyhtml@w3.org from September 2017)

From: Peter Murray-Rust <pm286@cam.ac.uk>
Date: Sun, 10 Sep 2017 10:03:07 +0100
To: Sarven Capadisli <info@csarven.ca>
Cc: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
Message-ID: <CAD2k14NRQ7oPOMxABpF5wBHdrMW2CAQ0sU6AAz31+V2j7BfB6w@mail.gmail.com>
At this stage I think it's critical to ask why the world needs SH , who is
likely to want to use it, and what they are going to do.

My personal observation over 25 years of HTML, SVG, MathML, CML is that
people do not adopt standards because they are better. They do it either
because they have real pain (interoperability, modelling, etc.) or because
everyone is doing it (either through convergence or through
industry/government regulation) and the standards help to solve that.

I can see the following places where SH may help some subsections of the
community. But they are local and unlikely to be all shared or shared by
all. This is not comprehensive.

1. easier authoring (academic)
2. better authoring (academic)
3. easier ingestion into established publishing process (publishers,
reviewer)
4. less friction in editing, and production (publisher, publisher
technology)
5. better final product (publisher)
6. easier re-use of publications (accessibility, machines)

Also there are several subsets of practice:
A. scholarly articles
B. conference papers
C. theses
D. gray literature
E. student papers / notebooks
F teaching resources.


There's a small subsection of the community that wants 123456. These are
likely to be small journals and certain conferences. It is most unlikely
that major publishers are interested in any of this. Remember that many
broken markets use lock-in and dysfunctionality to generate income. Also
much of the production is done by third-parties who have an interest in
keeping lockin.

My own interest is 6A, possibly 6B, 6C, 6D.  I want *a* version of HTML
that is likely to still be around in 2 years' time. I don't mind if it's
not widely accepted in standard publishers, but I do want a minimal toolset
to be viable.

My current model is to read "all" daily publications (ca 5000) and convert
them to SH, whatever that is. I am prepared to ingest JATS, PDF and
cruft-HTML from publishers sites. I have a lost in the system already
labelled "scholarly.html" which is a basic subset of HTML, mainly
concentrating on major sections (as given in previous post). If SH mutates
over the next 2 years I am happy to keep by editing XPaths - the other
problems are much deeper.

So I would urge list readers to think want they want. "convert all
scholarly publishing processes to HTML" will take 15 years even if everyone
wanted it - which they don't. I have talked with a lot of publishers and it
is difficult to see where they would immediately benefit. (3) reviewing and
triaging is a pain point,  but it's very publisher-specific.

I'd suggest concentrating on a small subset of the current community where
we think there is need and traction and work towards making something
useful, usable and promotable.


On Sat, Sep 9, 2017 at 11:04 PM, Sarven Capadisli <info@csarven.ca> wrote:

> On 2017-09-09 22:48, Johannes Wilm wrote:
> > The formats that focus on a limited tag-set have been developed already
> > (RASH and Scholarly HTML) may have just about everything we need
> > already.
> It certainly does not, and that's part of the issue here.
>
> Scholarly HTML doesn't set that constraint. RASH has the following 32
> elements:
>
> a, blockquote, body, code, em, figcaption, figure, h1, head, html, img,
> li, link, math, meta, ol, p, pre, q, script, section, span, strong, sub,
> sup, svg, table, td, th, title, tr, ul
>
> Looking at that list, it seems predominantly a *print first* approach,
> not "Web first"! In 2015 it was about 25 elements, and that was
> certainly all one needed. So much for that.
>
> The last thing SH would want to respond to the scholarly community is
> something like "`video`? Sorry that's not allowed. Please align your
> perception of scholarly information on the Web with ours (circa 2017)."
>
> That exact line of reasoning holds true for any given element or
> arbitrary constraint on top of the *living* HTML spec.
>
> Again, authors will want to do things beyond what SH could possibly
> capture, or the CG can plan for. Plenty of skills in this CG, but let's
> not forget that we are only a vocal minority. I suggest that we do not
> prematurely think we got scholarly information covered by way of x
> elements or whatever.
>
> -Sarven
> http://csarven.ca/#i
>
>


-- 
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Received on Sunday, 10 September 2017 09:03:32 UTC