Re: Scholarly paper in HTML+RDF through RASH

Hi Marynas,

first of all, thanks for your comments!

A couple of answers, motivating why we didn’t originally choose to use the HTML elements you suggested:

> A couple remarks regarding HTML:
> <p class="code"> could be <pre><code>
> http://www.w3.org/TR/html401/struct/text.html#edef-CODE <http://www.w3.org/TR/html401/struct/text.html#edef-CODE>

Basically, it was made on purpose as a design choice. In RASH we wanted to keep everything much easier, in particular when defining similar behaviour in different contexts (e.g., in inline elements and in block elements). If we use the full HTML approach as you suggested, I should use different tags for defining codes. In particular:

Inline code definition: 
<p>This text contains a <i><code>call to a function in italics</code></i> as an inline element.</p>

Block code definition:
<pre><code>This is a full block of code</code></pre>

As you can see, to have both situations I should use at least two additional elements of HTML (and thus I should have to extend RASH). In addition, to define block code, I should use *two* elements together. In order to keep the same “semantics” without adding additional elements to the language and trying to keep everything simpler and similar in both cases (inline and block), we thought that the use of the same “class” upon existing RASH elements could be easier to use and remember. For instance, the RASH translation of the aforementioned HTML code is:

<p>This text contains a <i class="code">call to a function in italics</i> as an inline element.</p>
<p class="code">This is a full block of code</p>

> <p class="quote"> could <blockquote>
> http://www.w3.org/TR/html401/struct/text.html#edef-BLOCKQUOTE
> 
> It think that would be more semantic :)

Still, another design choice.

Well, the actual syntax of “blockquote” is that it is a container containing one or more paragraphs (and other elements). The fact is that HTML simply allow you not to specify the “p” element if there is only one paragraph within the blockquote, i.e.,

<blockquote>Some quoted text</blockquote>

stands for 

<blockquote><p>Some quoted text</p></blockquote>

The former is a shortcut in HTML, but the actual real semantics is expressed by the latter one. So, if you want to express such structure correctly in RASH, we have to introduce an additional element (i.e., blockquote) and create the correct “blockquote > p” structure as mentioned. We thought that using a specific class to the paragraph – in order to characterise it as a quotation – was a simpler approach to block quotations.

However, contrarily to the tag “code”, in RASH the tag “q” is included for inline quotation. The basic explanation for that choice is that “q” is quite well-known and largely used, and we didn’t want to use a “span” + “@class” for defining this kind of semantics.

> BTW, shouldn't the JSON-LD media type in section #9 be <script
> type="application/ld+json"> ?
> http://www.w3.org/TR/json-ld/#h3_interpreting-json-as-json-ld <http://www.w3.org/TR/json-ld/#h3_interpreting-json-as-json-ld>

You are totally right, thanks for spotting it. I’ve added an issue in the RASH repo.

Thanks again and have a nice day :-)

S.

> On Fri, May 22, 2015 at 11:52 PM, Silvio Peroni <silvio.peroni@unibo.it> wrote:
>> Dear all,
>> 
>> Considering the several posts about this topic, I would like to share with you my personal experience in using HTML(+RDF) as a format for preparing/submitting/processing papers in scientific events.
>> 
>> In the past months, I (together with several people in the my research group at the University of Bologna plus other interested researchers from other institutions) have released a format for writing academic articles called RASH, i.e., Research Articles in Simplified HTML. RASH is a markup language that restricts the use of HTML elements to only 25 elements for writing academic research articles. It is possible to includes also RDFa annotations within any element of the language and other RDF statements in Turtle and JSON-LD format by using the appropriate tag "script". The RASH documentation is available online at [1] and documents RASH version 0.3.5, defined as a RelaxNG grammar [2].
>> 
>> RASH is the core component of a larger framework that includes a set of specifications and writing/conversion/extraction tools for academic articles. All the sources (released with Open Source and Creative Commons Licences) are available on GitHub [3] and have been developed by a group of several people so far. An internal note [4] provides a complete overview of the RASH Framework - please find attached the structured abstract of such note at the end of this email, for your convenience.
>> 
>> Currently, the RASH Framework includes the following tools:
>> 
>> - a script to enable RASH users to check their documents simultaneously both against the specific requirements in the RASH RelaxNG grammar and also against the full set of HTML checks that the W3C Nu HTML Checker (a.k.a., HTML5 validator) does for all HTML documents (by checking all requirements given in the HTML specification);
>> 
>> - javascript scripts (based on Bootstrap and JQuery) and CSS stylesheets (partially based on Linked Research [5] CSSs) implementing the visualisation of RASH documents in the browser. Such scripts also include into RASH papers a footbar with statistics about the paper (i.e., number of words, figures, tables and formulas), a menu to change the actual layout of the page, the automatic reordering of footnotes and references, the visualisation of the metadata of the paper, etc.;
>> 
>> - XSLT 2.0 files for converting RASH documents into LaTeX according to the ACM ICPS [6] and Springer LNCS [7] styles (other styles to come soon);
>> 
>> - an XSLT 2.0 file to perform conversions from OpenOffice documents into RASH documents;
>> 
>> - a Java application called SPAR Xtractor suite that takes a RASH document as input and returns a new RASH document where all its markup elements have been annotated with their actual (structural) semantics according to the Document Components Ontology (DoCO) [8].
>> 
>> In order to experiment with the use of RASH in official venues, it has been already proposed among the possible submission formats in three academic events, i.e., the Semantic Publishing Challenge 2015 [9] (that will be held during ESWC 2015), and the workshops SAVE-SD 2015 [10] (held during WWWW 2015) and Linking in the Cloud 2015 [11] (that will be held during Hypertext 2015).
>> 
>> In particular, six papers were actually submitted in RASH in the SAVE-SD 2015 Workshop [10] (which I have co-organised) - the sources of such papers are available in the workshop program webpage [12]. All the RASH papers also include RDF statements (for a total of about 1300 RDF triples) concerning article metadata, basic article structures (mainly based on DoCO [9]), citation functions (based on CiTO [13]), and even semantic descriptions of figures as in the case of the SAVE-SD 2015 Best RASH Paper [14].
>> 
>> It is worth mentioning that the conversion of the RASH submissions into the ACM format requested by Sheridan publisher (responsible for the publications of all WWW proceedings including the workshop proceedings) was handled by us, the workshop organisers, through a semi-automatic process. In particular, we used the aforementioned XSLT files to convert RASH papers into LaTeX files compliant with the official ACM format requested [6], and then we fixed only a few of layout misalignments.
>> 
>> I hope that the RASH Framework (together with others, e.g., Linked Research [5] and Scholarly Markdown [15]) and the related initiatives and adoption in academic events can be considered a first concrete step towards the possible adoption of HTML(+RDF) for scientific publications in academic venues.
>> 
>> I'm looking forward to having your comments about RASH and its framework and, in case you are already an earlier adopter of it, please feel free to participate in a 10 minutes survey about the use of RASH for writing academic papers, available at http://esurv.org/?u=rash-format.
>> 
>> Please don't hesitate to contact me (email: essepuntato@gmail.com) for comments, suggestions, and further questions.
>> 
>> Have a nice day :-)
>> 
>> S.
>> 
>> 
>> 
>> # References
>> 1. http://cs.unibo.it/save-sd/rash/documentation/index.html
>> 2. http://cs.unibo.it/save-sd/rash/grammar/rash.rng
>> 3. http://github.com/essepuntato/rash
>> 4. http://www.essepuntato.it/2015/sepublica/rash-sepublica2015.html
>> 5. https://github.com/csarven/linked-research
>> 6. http://www.acm.org/sigs/publications/proceedings-templates
>> 7. http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0
>> 8. Constantin, A., Peroni, S., Pettifer, S., Shotton, D., Vitali, F. (in press). The Document Components Ontology (DoCO). To appear in Semantic Web – Interoperability, Usability, Applicability. OA available at http://www.semantic-web-journal.net/content/document-components-ontology-doco-0
>> 9. https://github.com/ceurws/lod/wiki/SemPub2015
>> 10. http://cs.unibo.it/save-sd/2015/index.html
>> 11. http://lc2015.dibris.unige.it/
>> 12. http://cs.unibo.it/save-sd/2015/program.html
>> 13. Peroni, S., Shotton, D. (2012). FaBiO and CiTO: ontologies for describing bibliographic resources and citations. In Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 17 (December 2012): 33-43. Amsterdam, The Netherlands: Elsevier. http://dx.doi.org/10.1016/j.websem.2012.08.001
>> 14. Kuhn, T. (2015). Science Bots: A Model for the Future of Scientific Computation? http://cs.unibo.it/save-sd/2015/papers/html/kuhn-savesd2015.html
>> 15. http://scholarlymarkdown.com
>> 
>> 
>> # Abstract of [4]
>> Purpose: this paper introduces the RASH Framework, i.e., a set of specifications and tools for writing academic articles in RASH (a simplified version of HTML).
>> 
>> Design: RASH has been developed in order to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the publishing workflow.
>> 
>> Findings: RASH has been used for papers submitted to the SAVE-SD 2015 workshop. The authors of papers were able to self-learn it by simply referring to its documentation page without facing particular issues. The conversion of the RASH submissions into the format requested by the publisher was handled by the workshop organisers quickly through a semi-automatic process.
>> 
>> Research limitations: additional tools are needed, e.g., for extracting additional RDF statements from RASH documents and to enable additional conversion from/to existing formats.
>> 
>> Practical implications: the RASH Framework is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitate its automatic discovery, enable its linking to semantically related articles, provide access to data within the article in actionable form, and allow integration of data between papers.
>> 
>> Social implications: RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement).
>> 
>> Value: RASH focuses strictly on writing the content of the paper (i.e., organisation of text + semantic annotations) and leaves all the issues about validation, visualisation, conversion, and semantic data extraction to the various tools developed within the framework.
>> 
>> 
>> ----------------------------------------------------------------------------
>> Silvio Peroni, Ph.D.
>> Department of Computer Science and Engineering
>> University of Bologna, Bologna (Italy)
>> Tel: +39 051 2094871
>> E-mail: silvio.peroni@unibo.it
>> Web: http://www.essepuntato.it
>> Blog: http://palindrom.es/phd
>> Twitter: essepuntato
>> 
>> 



----------------------------------------------------------------------------
Silvio Peroni, Ph.D.
Department of Computer Science and Engineering
University of Bologna, Bologna (Italy)
Tel: +39 051 2094871
E-mail: silvio.peroni@unibo.it
Web: http://www.essepuntato.it
Blog: http://palindrom.es/phd
Twitter: essepuntato

Received on Tuesday, 26 May 2015 13:17:36 UTC