Re: html for scholarly communication: RASH, Scholarly HTML or Dokieli? from Peter Murray-Rust on 2017-09-09 (public-scholarlyhtml@w3.org from September 2017)

From: Peter Murray-Rust <pm286@cam.ac.uk>
Date: Sat, 9 Sep 2017 13:29:26 +0100
To: Johannes Wilm <johanneswilm@gmail.com>
Cc: Silvio Peroni <silvio.peroni@unibo.it>, Scholarly HTML community group <public-scholarlyhtml@w3.org>, Sarven Capadisli <info@csarven.ca>
Message-ID: <CAD2k14NN2_2WQQDHofgOp41Zh=vhui7QfoBmDo6DxqmQQK8FxQ@mail.gmail.com>

This is a hard problem. We need novel imaginative solutions.

Thousands of committed expert people have built systems for document
structure. As an example TEI (mainly digital humanities) - which
exemplifies the semantics-presentation division - has 500 tags/concepts (
https://en.wikipedia.org/wiki/Text_Encoding_Initiative ) . If we are to
take a humanities-based approach then we cannot ignore this history.
Similarly JATS (originally biomedical)  has ca 250 tags (at least that is
my current count in downloading actual JATS). Add in computer science,
geoscience, law, theses, grants, and much else and we have 1000 tags.

So what do we want SH-CG for? My requirement is simple to state. There are
(my own figures from analysing CrossRef over several months) ca 7000
"articles" published a day from 500 publishers and "most" have "HTML".

So what I want is to be able to read this into my machine without having to
have 200+ formats and 200+ semantics. I don't expect the semantics to cover
all nuances of a discipline (I and Henry developed Chemical Markup Language
and even that doesn't capture everything chemists want to publish). So I'd
settle for something with about 3-4 div-types

HTML
  HEAD - contains metadata - bibliographic, authors, titles, funders,
acknowledgements, etc. This is a solved problem
  BODY -
    DIV class="abstract/summary" // this seems to be fairly universally
required
    DIV class="maintext" - the core of the article
    DIV class="assets" - figures, tables, schemes
  FOOT:
     DIV class="references" (or citations)
     DIV class="publisher" - all the stuff that a publisher considers
important and I want to strip out
     DIV class="supplemental" - many things "attached" to the article and
published in paralllel - data sets

and then we can add finer markup which can by used of ignored as the
readers wishes.


On Sat, Sep 9, 2017 at 9:40 AM, Johannes Wilm <johanneswilm@gmail.com>
wrote:

> Yes, so anthropology is somewhere between humanities and social sciences.
> History is more clearly humanities. Political sciences or sociology should
> require all the things anthropology requires plus some more and would
> therefore probably be better picks to represent the social sciences.
>
> Do we have someone in those fields? Then I could concentrate on history.
> Open access and a license that allows remixing would be preferable.
>
> On 9 Sep 2017 9:46 am, "Silvio Peroni" <silvio.peroni@unibo.it> wrote:
>
>> Hi Sarven, all,
>>
>> Anyone fancy doing a comparative analysis or even mocking up the same
>> (ideally rather complex) article in ScholarlyHTML, RASH, and anything
>> else we'd care to compare/discuss?
>>
>>
>> Great ideas! We can all pitch in from our respective areas.
>>
>>
>> Thats a great idea, indeed! However, please, we should not came out with
>> a huge set of example, only one per discipline. Just to be more precise, I
>> would suggest to use the categorisation in https://en.wikipedia.org/wiki/
>> List_of_academic_fields, using the first level entities of such
>> taxonomy, i.e.:
>>
>> - Humanities
>> - Social sciences
>> - Natural sciences
>> - Formal sciences
>> - Professions and applied sciences
>>
>> I think it would be enough to have one paper for each of the
>> aforementioned fields for starting – better if the selected papers are Open
>> Access, just to avoid useless discussions with publishers on rights to be
>> shared in another channel by someone that is not the author of the article.
>> I know that it is possible that subfields of each field can have different
>> needs in terms of article content, but we cannot cover the whole literature
>> at this point, can we?
>>
>> I think Sarven and I could cover the “Formal sciences” part – in
>> particular, while selecting a paper in the Computer Science sub-field,
>> since we are actually working there, we need to consider something that
>> include mathematical formulas I believe.
>>
>> Could someone else help with the other fields?
>>
>> Have a nice day :-)
>>
>> S.
>>
>>
>>
>>
>> ------------------------------------------------------------
>> ----------------
>> Silvio Peroni, Ph.D.
>> Department of Computer Science and Engineering
>> University of Bologna, Bologna (Italy)
>> Tel: +39 051 2095393 <+39%20051%20209%205393>
>> E-mail: silvio.peroni@unibo.it
>> Web: https://www.unibo.it/sitoweb/silvio.peroni/en
>> Twitter: essepuntato
>>
>>


-- 
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Received on Saturday, 9 September 2017 12:29:53 UTC