- From: Peter Murray-Rust <pm286@cam.ac.uk>
- Date: Sun, 29 Nov 2015 11:04:47 +0000
- To: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
- Message-ID: <CAD2k14Puag2JCy7zq=hxKe6Y9R13fA5cnf1CFsZFnUogwSJATg@mail.gmail.com>
Since I and others have been responsible over several years for the name and initiative "ScholarlyHTML" here's some background. Before I start I'll say that I am fully committed to this Community Group and happy to try to work towards a single concept of ScholarlyHTML. The initiative for "ScholarlyHTML" was 4 years ago: http://blogs.ch.cam.ac.uk/pmr/2011/02/09/scholarly-html-hackfest-cambridge-uk-march/ at a hack meeting in Cambridge UK. The attendees included scientists, scientific publishers, and tool creators. Martin Fenner has written a history: http://blogs.plos.org/mfenner/2011/03/19/a-very-brief-history-of-scholarly-html/ and we created a web resource http://scholarlyhtml.org/ (At least Peter Sefton and Martin Fenner have joined the current W3C group and can moderate what I write.) It's useful to understand the motivation/s at that time, which are still largely valid today and which we hope can, at least in part, be taken as some of the motivations for the current WG. in 2010 A widespread concern had arisen about the inadequacy of PDF as a means for communicating scholarship, epitomised by "BeyondThePDF" meetings run by Phil Bourne and others (Martin Fenner's account: http://blogs.plos.org/mfenner/2010/11/06/beyond-the-pdf-it-is-time-for-a-workshop/ ). Peter Sefton (at University of Southern Queensland) built a campus-wide authoring and document management tool (ICE) which was widely used for course materials ad publishing. It's among the most successful semantic HTML-authoring tools in academia. My own interest (see http://contentmine.org) is based on reading the scientific literature into semantic form and searching/reusing it. ScholarlyHTML is a central part of this system. We are gearing up to read the daily scientific literature and - where copyright allows - publishing HTML transformations of articles. Currently HTML usage in scholarship is fragmented. Most theses and papers are authored in Word or La/TeX and then converted into PDF for delivery. Although many publishers also produce HTML, it is often not semantic, not well-formed or schema-valid, and contains large amounts of idiosyncratic behavioural markup (CSS or JS). Our vision is not technically complex and was aimed at re-use of exiting W3C recommendations and other Open protocols. It is based on simplicity: * it should be possible to author it easily either by humans or machines * it should be easily for humans and machines to read it * it should be extensible (for example by use of other XML languages - MathML, SVG ( CML for chemistry)) The primary use has been in contentmine.org for transforming current scholarship into semantic form. This will continue, but contentmine.org is at a sufficiently early stage that it should be possible to adjust the contentmine "ScholarlyHTML" as this WG proceeds. In that way it should be possible to have minimal confusion of terms. And if it works smoothly we should be able to produce significant amounts of valid ScholarlyHTML from exiting publications. P. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
Received on Sunday, 29 November 2015 11:05:18 UTC