Re: html for scholarly communication: RASH, Scholarly HTML or Dokieli? from Johannes Wilm on 2017-09-09 (public-scholarlyhtml@w3.org from September 2017)

From: Johannes Wilm <johanneswilm@gmail.com>
Date: Sat, 9 Sep 2017 16:44:04 +0200
To: Peter Murray-Rust <pm286@cam.ac.uk>
Cc: Scholarly HTML community group <public-scholarlyhtml@w3.org>, Sarven Capadisli <info@csarven.ca>, Silvio Peroni <silvio.peroni@unibo.it>
Message-ID: <CABkgm-TCc3DDq1Wyu6Fjz6-+tYDDRC1GFtQ9HtrYhVEaMfF3mA@mail.gmail.com>
Hey,
just quickly: humanists can probably remember just as much or little as
people of other sciences. So 200 or 300 tags is, similarly to JATS, not
something the average scientist can remember, so this must be somewhere
further down in tge production chain. I will have to look at it in detail
again in the case of humanities, but for social sciences the main
difficulty I have run into is citations and getting them expressed in
systems made for other sciences. It generally boils down to cases like:

"As Wallström (1945: p. 56) points out... ."
"This has been written about by a number of people (see for example
Plexstein 1956: p. 27, Nielsen 1972: p. 46, Austen 2012: p. 33)."

Having in-text citations rather than footnotes can many times be a matter
just of style, but it's not working the same way if several works need to
be referenced in the same citation, or if the author's name needs to be
part of the sentence and not inside the parentheses.

Biblatex and some other systems can handle things like. And I would think
that making sure that it also works in what we come up with should not be
too difficult. But... Let's try it out. Maybe I'm wrong and we need
hundreds of special tags and then we can forget about it.

On 9 Sep 2017 2:29 pm, "Peter Murray-Rust" <pm286@cam.ac.uk> wrote:

> This is a hard problem. We need novel imaginative solutions.
>
> Thousands of committed expert people have built systems for document
> structure. As an example TEI (mainly digital humanities) - which
> exemplifies the semantics-presentation division - has 500 tags/concepts (
> https://en.wikipedia.org/wiki/Text_Encoding_Initiative ) . If we are to
> take a humanities-based approach then we cannot ignore this history.
> Similarly JATS (originally biomedical)  has ca 250 tags (at least that is
> my current count in downloading actual JATS). Add in computer science,
> geoscience, law, theses, grants, and much else and we have 1000 tags.
>
> So what do we want SH-CG for? My requirement is simple to state. There are
> (my own figures from analysing CrossRef over several months) ca 7000
> "articles" published a day from 500 publishers and "most" have "HTML".
>
> So what I want is to be able to read this into my machine without having
> to have 200+ formats and 200+ semantics. I don't expect the semantics to
> cover all nuances of a discipline (I and Henry developed Chemical Markup
> Language and even that doesn't capture everything chemists want to
> publish). So I'd settle for something with about 3-4 div-types
>
> HTML
>   HEAD - contains metadata - bibliographic, authors, titles, funders,
> acknowledgements, etc. This is a solved problem
>   BODY -
>     DIV class="abstract/summary" // this seems to be fairly universally
> required
>     DIV class="maintext" - the core of the article
>     DIV class="assets" - figures, tables, schemes
>   FOOT:
>      DIV class="references" (or citations)
>      DIV class="publisher" - all the stuff that a publisher considers
> important and I want to strip out
>      DIV class="supplemental" - many things "attached" to the article and
> published in paralllel - data sets
>
> and then we can add finer markup which can by used of ignored as the
> readers wishes.
>
>
> On Sat, Sep 9, 2017 at 9:40 AM, Johannes Wilm <johanneswilm@gmail.com>
> wrote:
>
>> Yes, so anthropology is somewhere between humanities and social sciences.
>> History is more clearly humanities. Political sciences or sociology should
>> require all the things anthropology requires plus some more and would
>> therefore probably be better picks to represent the social sciences.
>>
>> Do we have someone in those fields? Then I could concentrate on history.
>> Open access and a license that allows remixing would be preferable.
>>
>> On 9 Sep 2017 9:46 am, "Silvio Peroni" <silvio.peroni@unibo.it> wrote:
>>
>>> Hi Sarven, all,
>>>
>>> Anyone fancy doing a comparative analysis or even mocking up the same
>>> (ideally rather complex) article in ScholarlyHTML, RASH, and anything
>>> else we'd care to compare/discuss?
>>>
>>>
>>> Great ideas! We can all pitch in from our respective areas.
>>>
>>>
>>> Thats a great idea, indeed! However, please, we should not came out with
>>> a huge set of example, only one per discipline. Just to be more precise, I
>>> would suggest to use the categorisation in
>>> https://en.wikipedia.org/wiki/List_of_academic_fields, using the first
>>> level entities of such taxonomy, i.e.:
>>>
>>> - Humanities
>>> - Social sciences
>>> - Natural sciences
>>> - Formal sciences
>>> - Professions and applied sciences
>>>
>>> I think it would be enough to have one paper for each of the
>>> aforementioned fields for starting – better if the selected papers are Open
>>> Access, just to avoid useless discussions with publishers on rights to be
>>> shared in another channel by someone that is not the author of the article.
>>> I know that it is possible that subfields of each field can have different
>>> needs in terms of article content, but we cannot cover the whole literature
>>> at this point, can we?
>>>
>>> I think Sarven and I could cover the “Formal sciences” part – in
>>> particular, while selecting a paper in the Computer Science sub-field,
>>> since we are actually working there, we need to consider something that
>>> include mathematical formulas I believe.
>>>
>>> Could someone else help with the other fields?
>>>
>>> Have a nice day :-)
>>>
>>> S.
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------
>>> ----------------
>>> Silvio Peroni, Ph.D.
>>> Department of Computer Science and Engineering
>>> University of Bologna, Bologna (Italy)
>>> Tel: +39 051 2095393 <+39%20051%20209%205393>
>>> E-mail: silvio.peroni@unibo.it
>>> Web: https://www.unibo.it/sitoweb/silvio.peroni/en
>>> Twitter: essepuntato
>>>
>>>
>
>
> --
> Peter Murray-Rust
> Reader Emeritus in Molecular Informatics
> Unilever Centre, Dept. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069 <+44%201223%20763069>
>
Received on Saturday, 9 September 2017 14:44:30 UTC