Historical Background

Since I and others have been responsible over several years for the name
and initiative "ScholarlyHTML"  here's some background. Before I start I'll
say that I am fully committed to this Community Group and happy to try to
work towards a single concept of ScholarlyHTML.

The initiative for "ScholarlyHTML" was 4 years ago:
http://blogs.ch.cam.ac.uk/pmr/2011/02/09/scholarly-html-hackfest-cambridge-uk-march/
at a hack meeting in Cambridge UK.  The attendees included scientists,
scientific publishers, and tool creators. Martin Fenner has written a
history:
http://blogs.plos.org/mfenner/2011/03/19/a-very-brief-history-of-scholarly-html/
and we created a web resource
http://scholarlyhtml.org/

(At least Peter Sefton and Martin Fenner have joined the current W3C group
and can moderate what I write.) It's useful to understand the motivation/s
at that time, which are still largely valid today and which we hope can, at
least in part, be taken as some of the motivations for the current WG. in
2010 A widespread concern had arisen about the inadequacy of PDF as a means
for communicating scholarship, epitomised by "BeyondThePDF" meetings run by
Phil Bourne and others (Martin Fenner's account:
http://blogs.plos.org/mfenner/2010/11/06/beyond-the-pdf-it-is-time-for-a-workshop/
). Peter Sefton (at University of Southern Queensland) built a campus-wide
authoring and document management tool (ICE) which was widely used for
course materials ad publishing. It's among the most successful semantic
HTML-authoring tools in academia.

My own interest (see http://contentmine.org) is based on reading the
scientific literature into semantic form and searching/reusing it.
ScholarlyHTML is a central part of this system. We are gearing up to read
the daily scientific literature and - where copyright allows - publishing
HTML transformations of articles.

Currently HTML usage in scholarship is fragmented. Most theses and papers
are authored in Word or La/TeX and then converted into PDF for delivery.
Although many publishers also produce HTML, it is often not semantic, not
well-formed or schema-valid, and contains large amounts of idiosyncratic
behavioural markup (CSS or JS).

Our vision is not technically complex and was aimed at re-use of exiting
W3C recommendations and other Open protocols. It is based on simplicity:
* it should be possible to author it easily either by humans or machines
* it should be easily for humans and machines to read it
* it should be extensible (for example by use of other XML languages -
MathML, SVG ( CML for chemistry))

The primary use has been in contentmine.org for transforming current
scholarship into semantic form. This will continue, but contentmine.org is
at a sufficiently early stage that it should be possible to adjust the
contentmine "ScholarlyHTML" as this WG proceeds. In that way it should be
possible to have minimal confusion of terms. And if it works smoothly we
should be able to produce significant amounts of valid ScholarlyHTML from
exiting publications.

P.









-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Received on Sunday, 29 November 2015 11:05:18 UTC