Re: Name of the CG from Robin Berjon on 2015-12-01 (public-scholarlyhtml@w3.org from December 2015)

From: Robin Berjon <robin@berjon.com>
Date: Tue, 1 Dec 2015 00:10:56 -0500
To: Raphael Wimmer <Raphael.Wimmer@sprachlit.uni-regensburg.de>
Cc: public-scholarlyhtml@w3.org
Message-ID: <565D2BE0.2030702@berjon.com>
Hi Raphael,

On 26/11/2015 07:58 , Raphael Wimmer wrote:
> I tend to agree with Ivan. I was drawn to this community group because I am
> interested in the practical usability and utility of publishing and
> collaborating using open standards (and HTML is the de-facto standard).
> To me, the description of the group sounded like a place where concrete
> practical aspects of publishing scientific papers written in HTML are
> discussed.

Much agreed.

> FWIW, my primary topics of interest include:

(Thanks, this is a good list!)

> - How to model the current structure of scientific papers (e.g. keywords,
> figures, footnotes, references) in HTML. Most parts seem straighforward, for
> others no HTML equivalent seems to exist.

I think that the problem is actually slightly worse than that. Even for
parts where there does exist a straightforward HTML way of encoding a
construct, there may be *several* ways of doing so. Sometimes, that may
be acceptable, but this variability also introduces drift. Wiggle room
is multiplicative. It's okay in a small domain because it stays
constrained, but once you start working at larger scales, with complete
tooling ecosystems and various communities that don't even communicate
together, it becomes unmanageable.

That is why I think that, at least for the landmark parts of scholarly
content, we should identify not just a correct HTML construct but as
much as possible establish a "correct" one.

> - How to represent references to other works in a way that is robust,
> versatile, and extendable.

That's a hard one, but also one of the more useful. We had a discussion
not long ago in the DPUB IG
(https://lists.w3.org/Archives/Public/public-digipub-ig/2015Sep/0168.html)
about this very topic.

Since then I've worked on my side about a solution for this. I have an
implementation that should make the concept clearer, but it's not yet
completely ready for sharing. Basically the thinking is this:

  • Pretty much all reference conventions were created for print,
typically to save space while conveying relatively clear semantics to
the initiated (sometimes the very initiated). But that's daft: in a
digital world the number of pages you gain at the back of a monograph is
of little practical importance.

  • A reference format should therefore be designed first for human
consumption, and foremost with accessibility in mind. I haven't
user-tested this but I am pretty certain that something like the
Vancouver style, if taken raw (ie. mostly as visual formatting), has
pretty bad accessibility usability.

  • Then, since a) people do have ingrained preferences that it is good
to cater to and b) printing actually does matter, make the format
styleable with CSS so that user-preference (or the nice print people)
can tailor it to their needs.

This leads to a format that has (shockingly) some human bits in it. You
get an output that reads something like:

  "Cryptozoology: A Primer"
  by Kjetil Kjernsmo,
  pages 42-117.
  Published by Dahut Presses in 1997.

The human bits ("by", "in", "pages", "Published by") are marked up
specifically so as to be easily ignorable (in my implementation with
<small>, which is a debatable choice) by being pushed off-screen with
CSS. The rest is heavily marked up with <a>, <cite>, <time>, and a lot
of <span> with RDFa+schema.org.

I still need to test how flexible to styling the result is. I know it's
flexible, I don't know if it's flexible enough under highly variable
conditions.

> - How to make authoring of scientific papers in HTML as easy as possible

I think that this is something that the format should enable rather than
support. If we create a great format that is useful enough for
interchange (and long-term archival) that it starts getting accepted by
a growing number of journals, conferences, etc. then producing authoring
formats that target it for output would be easy (as opposed to the
situation we have today where it isn't even clear what to target).

Of course creating a format that is both great to author and for interop
would be ideal, but I think that for some aspects (notably references)
that will be at best difficult without sacrificing too much).

> - How to make HTML-only papers archivable and robust against link-rot and
> different/changing browser implementations (e.g., it would be bad if charts
> were displayed differently by different browsers)

Using HTML5 + other major chunks of technology I am confident that we
can get all of that (it's what it was designed for), except for link-rot
protection. That is harder since links are to externally controlled content.

> - How to use the capabilities of the WWW (links, interactivity, collaboration)
> for enhancing scientific communication.

This is a part that can easily come into conflict with your previous
point about archival. I think we can solve it though. One important
aspect is to make it possible to have SH content surrounded by a lot of
unrelated content (site header/footer, navigation, ads) while
guaranteeing that the SH information can still be extracted reliably
(and the rest properly ignored by an SH processor).

> - How to get conferences / journals to accept submissions in HTML format (a
> topic, we are currently looking at).

We're working on that too. We have built a scholarly publishing pipeline
that uses (our proposal for) SH as its internal article format. We'd
like for it to plug into other people's tools so that we can all benefit
from a commonly improved ecosystem.

> Is this the right forum for discussing such questions?

Absolutely!

-- 
• Robin Berjon - http://berjon.com/ - @robinberjon
• http://science.ai/ — intelligent science publishing
•
Received on Tuesday, 1 December 2015 05:11:30 UTC