W3C home > Mailing lists > Public > semantic-web@w3.org > May 2015

Scholarly paper in HTML+RDF through RASH

From: Silvio Peroni <silvio.peroni@unibo.it>
Date: Fri, 22 May 2015 23:52:08 +0200
Message-ID: <3C240D58-E928-4664-826F-0BDE71DE0893@unibo.it>
To: Linking Open Data Mailing List Data <public-lod@w3.org>, "Semantic Web Mailing List" <semantic-web@w3.org>
Dear all,

Considering the several posts about this topic, I would like to share with you my personal experience in using HTML(+RDF) as a format for preparing/submitting/processing papers in scientific events.

In the past months, I (together with several people in the my research group at the University of Bologna plus other interested researchers from other institutions) have released a format for writing academic articles called RASH, i.e., Research Articles in Simplified HTML. RASH is a markup language that restricts the use of HTML elements to only 25 elements for writing academic research articles. It is possible to includes also RDFa annotations within any element of the language and other RDF statements in Turtle and JSON-LD format by using the appropriate tag "script". The RASH documentation is available online at [1] and documents RASH version 0.3.5, defined as a RelaxNG grammar [2].

RASH is the core component of a larger framework that includes a set of specifications and writing/conversion/extraction tools for academic articles. All the sources (released with Open Source and Creative Commons Licences) are available on GitHub [3] and have been developed by a group of several people so far. An internal note [4] provides a complete overview of the RASH Framework - please find attached the structured abstract of such note at the end of this email, for your convenience. 

Currently, the RASH Framework includes the following tools:

- a script to enable RASH users to check their documents simultaneously both against the specific requirements in the RASH RelaxNG grammar and also against the full set of HTML checks that the W3C Nu HTML Checker (a.k.a., HTML5 validator) does for all HTML documents (by checking all requirements given in the HTML specification);

- javascript scripts (based on Bootstrap and JQuery) and CSS stylesheets (partially based on Linked Research [5] CSSs) implementing the visualisation of RASH documents in the browser. Such scripts also include into RASH papers a footbar with statistics about the paper (i.e., number of words, figures, tables and formulas), a menu to change the actual layout of the page, the automatic reordering of footnotes and references, the visualisation of the metadata of the paper, etc.;

- XSLT 2.0 files for converting RASH documents into LaTeX according to the ACM ICPS [6] and Springer LNCS [7] styles (other styles to come soon);

- an XSLT 2.0 file to perform conversions from OpenOffice documents into RASH documents;

- a Java application called SPAR Xtractor suite that takes a RASH document as input and returns a new RASH document where all its markup elements have been annotated with their actual (structural) semantics according to the Document Components Ontology (DoCO) [8].

In order to experiment with the use of RASH in official venues, it has been already proposed among the possible submission formats in three academic events, i.e., the Semantic Publishing Challenge 2015 [9] (that will be held during ESWC 2015), and the workshops SAVE-SD 2015 [10] (held during WWWW 2015) and Linking in the Cloud 2015 [11] (that will be held during Hypertext 2015).

In particular, six papers were actually submitted in RASH in the SAVE-SD 2015 Workshop [10] (which I have co-organised) - the sources of such papers are available in the workshop program webpage [12]. All the RASH papers also include RDF statements (for a total of about 1300 RDF triples) concerning article metadata, basic article structures (mainly based on DoCO [9]), citation functions (based on CiTO [13]), and even semantic descriptions of figures as in the case of the SAVE-SD 2015 Best RASH Paper [14].

It is worth mentioning that the conversion of the RASH submissions into the ACM format requested by Sheridan publisher (responsible for the publications of all WWW proceedings including the workshop proceedings) was handled by us, the workshop organisers, through a semi-automatic process. In particular, we used the aforementioned XSLT files to convert RASH papers into LaTeX files compliant with the official ACM format requested [6], and then we fixed only a few of layout misalignments.

I hope that the RASH Framework (together with others, e.g., Linked Research [5] and Scholarly Markdown [15]) and the related initiatives and adoption in academic events can be considered a first concrete step towards the possible adoption of HTML(+RDF) for scientific publications in academic venues.

I'm looking forward to having your comments about RASH and its framework and, in case you are already an earlier adopter of it, please feel free to participate in a 10 minutes survey about the use of RASH for writing academic papers, available at http://esurv.org/?u=rash-format.

Please don't hesitate to contact me (email: essepuntato@gmail.com) for comments, suggestions, and further questions.

Have a nice day :-)


# References
1. http://cs.unibo.it/save-sd/rash/documentation/index.html
2. http://cs.unibo.it/save-sd/rash/grammar/rash.rng
3. http://github.com/essepuntato/rash
4. http://www.essepuntato.it/2015/sepublica/rash-sepublica2015.html
5. https://github.com/csarven/linked-research
6. http://www.acm.org/sigs/publications/proceedings-templates
7. http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0
8. Constantin, A., Peroni, S., Pettifer, S., Shotton, D., Vitali, F. (in press). The Document Components Ontology (DoCO). To appear in Semantic Web – Interoperability, Usability, Applicability. OA available at http://www.semantic-web-journal.net/content/document-components-ontology-doco-0
9. https://github.com/ceurws/lod/wiki/SemPub2015
10. http://cs.unibo.it/save-sd/2015/index.html
11. http://lc2015.dibris.unige.it/
12. http://cs.unibo.it/save-sd/2015/program.html 
13. Peroni, S., Shotton, D. (2012). FaBiO and CiTO: ontologies for describing bibliographic resources and citations. In Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 17 (December 2012): 33-43. Amsterdam, The Netherlands: Elsevier. http://dx.doi.org/10.1016/j.websem.2012.08.001 
14. Kuhn, T. (2015). Science Bots: A Model for the Future of Scientific Computation? http://cs.unibo.it/save-sd/2015/papers/html/kuhn-savesd2015.html
15. http://scholarlymarkdown.com

# Abstract of [4]
Purpose: this paper introduces the RASH Framework, i.e., a set of specifications and tools for writing academic articles in RASH (a simplified version of HTML).

Design: RASH has been developed in order to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the publishing workflow.

Findings: RASH has been used for papers submitted to the SAVE-SD 2015 workshop. The authors of papers were able to self-learn it by simply referring to its documentation page without facing particular issues. The conversion of the RASH submissions into the format requested by the publisher was handled by the workshop organisers quickly through a semi-automatic process.

Research limitations: additional tools are needed, e.g., for extracting additional RDF statements from RASH documents and to enable additional conversion from/to existing formats.

Practical implications: the RASH Framework is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitate its automatic discovery, enable its linking to semantically related articles, provide access to data within the article in actionable form, and allow integration of data between papers.

Social implications: RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement).

Value: RASH focuses strictly on writing the content of the paper (i.e., organisation of text + semantic annotations) and leaves all the issues about validation, visualisation, conversion, and semantic data extraction to the various tools developed within the framework.

Silvio Peroni, Ph.D.
Department of Computer Science and Engineering
University of Bologna, Bologna (Italy)
Tel: +39 051 2094871
E-mail: silvio.peroni@unibo.it
Web: http://www.essepuntato.it
Blog: http://palindrom.es/phd
Twitter: essepuntato
Received on Friday, 22 May 2015 21:52:40 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:43:00 UTC