RE: html for scholarly communication: RASH, Scholarly HTML or Dokieli? from Stian Soiland-Reyes on 2017-10-17 (public-scholarlyhtml@w3.org from October 2017)

From: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
Date: Tue, 17 Oct 2017 11:11:05 +0000
To: sebastien <sebastien.ballesteros@gmail.com>, "public-scholarlyhtml@w3.org" <public-scholarlyhtml@w3.org>
Message-ID: <D5780135E58FC940BDB87E7D499910184B4D2D9C@MBXP14.ds.man.ac.uk>
This looks really good, Sebastien!  I agree in that using structured RDFa/JSON-LD in free HTML is much preferably than trying to limit ourselves to a sub-set of HTML – as we see in this thread it is hard to reach agreement without also limiting future publication styles.  We should not be aiming to replicate 1960-style computer science papers with the odd hyperlink as the only enhancement.





I like how well you have given full, yet clear examples for each concept, and re-used JSON-LD and schema.org.  This should be quite compatible with the effort of http://bioschemas.org/ which has a lot of traction in the biology/bioinformatics community (but many of their standards are general for academics)  – perhaps Publication could be added there based on your effort and then propagate into schema.org? Recommend you to get in touch – see http://bioschemas.org/howtojoin/





I think the science.ai approach have lots of overlap with not just Scholarly HTML, but also our work on http://www.researchobject.org/ - in particular our Research Object Bundle https://w3id.org/bundle/ which also have a JSON-LD-based manifest  https://researchobject.github.io/specifications/bundle/#manifest – there we didn’t attempt to “deconstruct” the publication, but focused more on the supporting data and software sources to go along the black-box publication in the RO. Combing with your approach would allow embedding rich structured metadata that can then easily be extracted (say into separate annotations) using off the shelf RDFa/JSON-LD tools.



There’s also concurrent work such as eLife’s Reproducible Document Stack https://elifesciences.org/labs/7dbeb390/reproducible-document-stack-supporting-the-next-generation-research-article - although that is working with JATS XML as the base format it has similar archiving considerations, and I’ve been pushing for them to add some kind of Scholarly HTML as an embedded format.



One challenge as usual is how to squeeze the structured metadata out of the authors. eLife are working on interactive editors for this, similar HTML-based approaches are of course the previously mentioned https://dokie.li which in the WYSIWYG editor allow you to add microdata anywhere (as well as generating structural microdata for paragraphs etc).







Side-note for manifest people:

I see in https://nightly.science.ai/documentation/archive#graph-content  you have quite a minimal manifest (good!) as a @graph, but without relating the contained resources to the (implied) aggregation. This can make it hard to understand what is part of the aggregation (e.g. what you directly list under @graph), and what is just a sub-resource (like your DataDownload example). Is there a reason why you didn’t use a property to list these? We reused OAI-ORE ore:aggregates for this purpose (mapped through our JSON-LD context) – I think your archive is also in effect making an ore:Aggregation or even an ro:ResearchObject – so perhaps reuse of those would be beneficial.





Happy to set up a call if you like to discuss further!



--
Stian Soiland-Reyes, eScience Lab
School of Computer Science, The University of Manchester
http://orcid.org/0000-0001-9842-9718



From: sebastien<mailto:sebastien.ballesteros@gmail.com>
Sent: 16 October 2017 10:15
To: public-scholarlyhtml@w3.org<mailto:public-scholarlyhtml@w3.org>
Subject: Re: html for scholarly communication: RASH, Scholarly HTML or Dokieli?



Hello,

A quick update on science.ai documentation effort.

As Robin mentioned we have been iterating quite a lot on scholarly
HTML internally. What we learned along the way (working with several
established players in the field) is that trying to standardize or
define constraints at the HTML level is somewhat too constraining (we
are planning to provide more context on that soon).

In our case, agreeing on a vocabulary and using RDFa and / or JSON-LD
to express it (without additional constraints) has proven to be more
productive.  For us, schema.org (and the process in place to extend
it) provides enough basis to make that work. For that reason we are
now mostly focused on exposing and documenting schema.org patterns
that are useful in the context of scholarly publishing.

I will post an updated link when our documentation hits our production
website but in the meantime feel free to check out
https://nightly.science.ai/documentation/archive if you are curious
about what we have been doing since the days of
http://scholarly.vernacular.io/.  If you look don't pay too much
attention to the archive stuff, but the JSON-LD / RDFa examples should
provide a good idea of the schema.org patterns that we have found
useful in the context of scholarly publishing.

Sebastien
Received on Tuesday, 17 October 2017 11:11:31 UTC