- From: Peter Murray-Rust <pm286@cam.ac.uk>
- Date: Sat, 9 Sep 2017 13:29:26 +0100
- To: Johannes Wilm <johanneswilm@gmail.com>
- Cc: Silvio Peroni <silvio.peroni@unibo.it>, Scholarly HTML community group <public-scholarlyhtml@w3.org>, Sarven Capadisli <info@csarven.ca>
- Message-ID: <CAD2k14NN2_2WQQDHofgOp41Zh=vhui7QfoBmDo6DxqmQQK8FxQ@mail.gmail.com>
This is a hard problem. We need novel imaginative solutions. Thousands of committed expert people have built systems for document structure. As an example TEI (mainly digital humanities) - which exemplifies the semantics-presentation division - has 500 tags/concepts ( https://en.wikipedia.org/wiki/Text_Encoding_Initiative ) . If we are to take a humanities-based approach then we cannot ignore this history. Similarly JATS (originally biomedical) has ca 250 tags (at least that is my current count in downloading actual JATS). Add in computer science, geoscience, law, theses, grants, and much else and we have 1000 tags. So what do we want SH-CG for? My requirement is simple to state. There are (my own figures from analysing CrossRef over several months) ca 7000 "articles" published a day from 500 publishers and "most" have "HTML". So what I want is to be able to read this into my machine without having to have 200+ formats and 200+ semantics. I don't expect the semantics to cover all nuances of a discipline (I and Henry developed Chemical Markup Language and even that doesn't capture everything chemists want to publish). So I'd settle for something with about 3-4 div-types HTML HEAD - contains metadata - bibliographic, authors, titles, funders, acknowledgements, etc. This is a solved problem BODY - DIV class="abstract/summary" // this seems to be fairly universally required DIV class="maintext" - the core of the article DIV class="assets" - figures, tables, schemes FOOT: DIV class="references" (or citations) DIV class="publisher" - all the stuff that a publisher considers important and I want to strip out DIV class="supplemental" - many things "attached" to the article and published in paralllel - data sets and then we can add finer markup which can by used of ignored as the readers wishes. On Sat, Sep 9, 2017 at 9:40 AM, Johannes Wilm <johanneswilm@gmail.com> wrote: > Yes, so anthropology is somewhere between humanities and social sciences. > History is more clearly humanities. Political sciences or sociology should > require all the things anthropology requires plus some more and would > therefore probably be better picks to represent the social sciences. > > Do we have someone in those fields? Then I could concentrate on history. > Open access and a license that allows remixing would be preferable. > > On 9 Sep 2017 9:46 am, "Silvio Peroni" <silvio.peroni@unibo.it> wrote: > >> Hi Sarven, all, >> >> Anyone fancy doing a comparative analysis or even mocking up the same >> (ideally rather complex) article in ScholarlyHTML, RASH, and anything >> else we'd care to compare/discuss? >> >> >> Great ideas! We can all pitch in from our respective areas. >> >> >> Thats a great idea, indeed! However, please, we should not came out with >> a huge set of example, only one per discipline. Just to be more precise, I >> would suggest to use the categorisation in https://en.wikipedia.org/wiki/ >> List_of_academic_fields, using the first level entities of such >> taxonomy, i.e.: >> >> - Humanities >> - Social sciences >> - Natural sciences >> - Formal sciences >> - Professions and applied sciences >> >> I think it would be enough to have one paper for each of the >> aforementioned fields for starting – better if the selected papers are Open >> Access, just to avoid useless discussions with publishers on rights to be >> shared in another channel by someone that is not the author of the article. >> I know that it is possible that subfields of each field can have different >> needs in terms of article content, but we cannot cover the whole literature >> at this point, can we? >> >> I think Sarven and I could cover the “Formal sciences” part – in >> particular, while selecting a paper in the Computer Science sub-field, >> since we are actually working there, we need to consider something that >> include mathematical formulas I believe. >> >> Could someone else help with the other fields? >> >> Have a nice day :-) >> >> S. >> >> >> >> >> ------------------------------------------------------------ >> ---------------- >> Silvio Peroni, Ph.D. >> Department of Computer Science and Engineering >> University of Bologna, Bologna (Italy) >> Tel: +39 051 2095393 <+39%20051%20209%205393> >> E-mail: silvio.peroni@unibo.it >> Web: https://www.unibo.it/sitoweb/silvio.peroni/en >> Twitter: essepuntato >> >> -- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
Received on Saturday, 9 September 2017 12:29:53 UTC