Re: HTML standard for scholarly publications ? from Gareth Oakes on 2022-11-27 (public-scholarlyhtml@w3.org from November 2022)

From: Gareth Oakes <goakes@gpsl.co>
Date: Sun, 27 Nov 2022 22:55:09 +0000
To: Peter Murray-Rust <pm286@cam.ac.uk>
CC: Kaveh <kaveh@rivervalleytechnologies.com>, "Peter (pt) Sefton" <pt@ptsefton.com>, "Morand, Robin" <robin.morand@szh.ch>, "public-scholarlyhtml@w3.org" <public-scholarlyhtml@w3.org>, Bjoern Brembs <bjoern@brembs.net>
Message-ID: <7A11F24B-CE1F-4CCB-9673-5FA02220479E@gpsl.co>
My view on this is that most STM publishers are commercially driven, some more so than others, but they would not want to miss out on an opportunity. At its most base level that is either a cost saving or revenue opportunity. We are taking a two-prong approach; simplified/standardised plumbing via HTML to save costs and the concept of simpler reuse in order to drive new product revenues. If you start with HTML then all things become easier; IMHO the trick (technically speaking) is to not lose the validation and transformation capabilities of XML along the way.

Of course there are many headwinds and the established traditions do not always align… that is where we are currently at.

-Gareth

From: Peter Murray-Rust <pm286@cam.ac.uk>
Date: Friday, 25 November 2022 at 18:23
To: Gareth Oakes <goakes@gpsl.co>
Cc: Kaveh <kaveh@rivervalleytechnologies.com>, "Peter (pt) Sefton" <pt@ptsefton.com>, "Morand, Robin" <robin.morand@szh.ch>, "public-scholarlyhtml@w3.org" <public-scholarlyhtml@w3.org>, Bjoern Brembs <bjoern@brembs.net>
Subject: Re: HTML standard for scholarly publications ?

(copying Bjoern Brembs)
This used to be a technical problem.

It is now 100+% sociopolitical.

Simply:
The PublisherAcademicComplex *wants* publications to be 50 years behind the edge of progress. They don't want XML (except internal). Some publishers (MDPI, BMC, Hindawi...) used to display XML prominently on their websites. No longer. There will be  widespread unspoken effective resistance to ScholarlyHtml (mainly complete non-engagement).

Why?
<rant>
Publishers now don't sell publications. Their business model is:
* authors pay for prestige (glory)
* publishers run surveillance systems and sell or reuse user data

The current model (the Holy Version of Record PDF, VoR) works wonderfully.   This is what authors are judged on. Since publishing is Rich North anglophone sighted (mainly) men the current model must be kept.  The PDF is its medium. It won't change in 20 years (and then climate change will have destroyed complex society).
I use JATS-XML from EuropePMC. It could be a lot better but it works. Do the publishers display it? Of course not. They have to retain their 500 own, 30-years-out-of-date sites for only one purpose. Publisher branding. Readers don't care but they have to read these awful double-column rotated tables because they have no option. Readers don't matter.

It isn't just academia. The world now accepts PDF as the holy way to publish. It destroys scientific knowledge. For example the UN IPCC AR6 report on the future of this planet is 10,000 pages of PDF. In English. It perpetuates Climate Injustice. We have a small group of (mainly Indian undergraduates) developing tools to convert this to XML.
Total waste of time. Except it is so important to the world we have to do it.
</rant>

P.


On Thu, Nov 24, 2022 at 10:38 PM Gareth Oakes <goakes@gpsl.co<mailto:goakes@gpsl.co>> wrote:
We’re still pushing this idea along in a strategic sense. We’ve recently devised a lossless bidirectional transform between JATS and a ScholarlyHTML-like format to try and gain momentum. I think if anything ScholarlyHTML was too early, I’m certain its time will come.

Perhaps the rise in ML and NLP solutions will help? Perhaps that will lead to a structured peer review process that simplifies and enables and end-to-end structured HTML workflow?

-Gareth

From: Kaveh <kaveh@rivervalleytechnologies.com<mailto:kaveh@rivervalleytechnologies.com>>
Date: Friday, 25 November 2022 at 07:03
To: "Peter (pt) Sefton" <pt@ptsefton.com<mailto:pt@ptsefton.com>>
Cc: "Morand, Robin" <robin.morand@szh.ch<mailto:robin.morand@szh.ch>>, "public-scholarlyhtml@w3.org<mailto:public-scholarlyhtml@w3.org>" <public-scholarlyhtml@w3.org<mailto:public-scholarlyhtml@w3.org>>
Subject: Re: HTML standard for scholarly publications ?
Resent from: <public-scholarlyhtml@w3.org<mailto:public-scholarlyhtml@w3.org>>
Resent date: Friday, 25 November 2022 at 07:03

Hi Peter

I think you were in the group of my friend and colleague, Peter Murray-Rust, who pushed this excellent idea for many years. It is a shame that structured and semantic HTML is not more prevalent, and we have to use AI to guess at the contents!!

We love structured content but keeping the structure in XML, with HTML pushed out as needed.

Regards
Kaveh

On Thu, 24 Nov 2022 at 19:57, Peter (pt) Sefton <pt@ptsefton.com<mailto:pt@ptsefton.com>> wrote:
Hi Robin,

It's been a long time and this project is definitely abandoned at this point. There was another group who started using the term ScholarlyHTML and we handed the term over to them but I didn't keep track of it.

Sorry I can't be of more help - please let us know how it goes, and maybe this can re-start the standards process?

Cheers
pt

On Fri, 25 Nov 2022 at 04:01, Morand, Robin <robin.morand@szh.ch<mailto:robin.morand@szh.ch>> wrote:
Hello,

I recently discovered your repository on Github which explains the syntax for a scholarly article in HTML. I am in charge of the conversion of documents for our publishing services and I wanted to know if the standards you explain are still up to date? The last update of the respository was 7 years ago so I'm wondering :-)

If it is no longer up to date, would you have a reference to recommend to me?

Best regards,
Robin


--
Peter Sefton +61410326955 pt@ptsefton.com<mailto:pt@ptsefton.com> http://ptsefton.com

Gmail, Twitter & Skype name: ptsefton


--
Kaveh Bazargan PhD
Director
River Valley Technologies<http://rivervalley.io> ● Twitter<https://twitter.com/rivervalley1000> ● LinkedIn<https://www.linkedin.com/in/bazargankaveh/> ● ORCID<https://orcid.org/0000-0002-1414-9098> ● @kaveh1000@mastodon.social
Accelerating the Communication of Research


--
"I always retain copyright in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same".

Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Yusuf Hamied Department of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-336432
Received on Sunday, 27 November 2022 22:55:26 UTC