Re: elements for basic academic articles

I think JohnP's list is a good starting point.

At EuropePMC we have a list of about 20 tags for sections/divd in the
article http://europepmc.org/ftp <http://europepmc.org/ftp/>/oa
<http://europepmc.org/ftp/oa/>/SectionTagger
<http://europepmc.org/ftp/oa/SectionTagger/>/ . The can be used for
retrospective markup of articles so include regexes for identifying
sections:, e.g.
https://github.com/ScholarlyHTML/spec/blob/master/sectiontags.md

I would find it useful to have one or more examples of articles actually
marked up in a proto-SH so that we can get a feel for what an average
document would look like. We'll probably be gradually assembling some in
ContentMine as we read the current literature.

On Wed, Dec 2, 2015 at 2:32 AM, Pedersen, John - Hoboken <jpederse@wiley.com
> wrote:

> One thought here is that the HTML elements/attributes for scholarly
> content should come from how best to capture in HTML5 the
> concepts/information/structure that scholarly/academic articles contain.
> That is, rather than jumping to the HTML5, first enumerate the concepts in
> some language-agnostic way and then see what HTML5 best fits.  The
> suggestions so far, both Johannes' below and the entire set of SH and
> friends candidate languages, are likely intending to provide already the
> benefit of having gone through this exercise, but since there's apparently
> several different conclusions, maybe it would be worth going through the
> analysis explicitly again?
>
> There's no shortage of material to draw on, given that there have been
> years (decades really) of defining the concepts important to
> research/scholarly articles. These are embodied in the many DTDs and other
> schema both public and proprietary that publishers and consumers of such
> content have defined. I'm thinking here of everything starting from Majour
> headers through commercial publishers' DTDs to PubMed and JATS. If the
> efforts that have been listed already have done this analysis, can that be
> shared here? Our own "WileyML" [1] intends to capture all of the concepts
> that Wiley has found necessary so far for its academic/journal content (but
> we are in flux, anticipating the future). These now extend beyond
> academic/scholarly articles, but there’s a pared-down list below that I’ve
> tried to restrict to concepts relevant to scholarly/academic articles.
>
> As usual the devil is in the details, with a prime example being even
> something as fundamental as paragraphs. The reality is that HTML's <p>,
> even in HTML5, does not fully capture the semantic notion of "paragraph"
> since that can for example contain displayed equations (it's relatively
> common for an equation to be followed by “where x is….”, clearly part of
> the same paragraph, although that’s not the only case). However the proper
> HTML for displayed objects such as equations involves a <div>, which cannot
> be within <p>.
>
> And of course it's not just a matter of specifying which
> elements/atts/values may be needed, but also structuring and additional
> rules that may be appropriate, but the list could be a good start. Is it
> worth us filling this out to agree on all the concepts we want to capture
> for scholarly/academic articles and then specifying the best HTML5
> construct for each? (no doubt the answer for many is <span> with some
> attribute(s)). We could also add restrictions/structuring. RELAX NG and
> Schematron anyone? :)
>
> John Pedersen
> Director, Content Architecture
>
>
> [1] *http://vendors.wiley.com/schemas/wileyml3g/*
> <http://vendors.wiley.com/schemas/wileyml3g/>
>
>
> *Scholarly/Academic Concepts, not including OASIS tables and MathML*
> *Metadata* *HTML /structure/constraints* *journal level* DOI for journal issn
> (print) issn (electronic) id (journal) title (of journal) abbreviated
> title (of journal) subject (of journal) *issue level* position in volume DOI
> for issue title (issue/supplement) copyright owner (issue) copyright line
> (issue) volume number issue number supplement number editor (for special
> issue) date issue started date issue completed cover date cover date
> (display form) *article level* article type article status position in
> issue e-locator page total word count access type (open?) ToC heading for
> article 1st level ToC heading for article 2nd level ToC heading for
> article 3rd level MedLine PubType MeSH checkword MeSH descriptor MeSH
> descriptor major topic? MeSH descriptor tree number MeSH descriptor
> unique ID MeSH qualifier MeSH qualifier major topic MeSH qualifier tree
> number MeSH qualifier unique ID link to typeset version link to typeset
> version first page link to plain text version link to author manuscript
> version embargo end date for author manuscript title - ToC form title -
> short (running) title - short authors(running) erratum target DOI retraction
> target DOI subject (article level) subject relevance editorial office ID file
> ID society ID supplier ID title (main article title) subtitle  (article) article
> category title pageHeading title first page last page article copyright doi
> (article) online pub date creator          creator role
> affiliation link         current affiliation link         ORCID
> honorifics         given names         name prefix (van der)
> family name         name suffix (Jr.)         degrees         titles
> after names         preferred display name         alternative name
> job title        biographical info        biographical photo        email
> (for creator)        website/url        phone        fax manuscript
> received date manuscript revised date manuscript accepted date funding
> agency funding grant number funder DOI Fundref name dedication license
> (legalStatement) supporting information corresponding author info
> affiliation          country code          orgDiv          orgName
> address                  street                  city
> postcode                 country part                 country header
> footnotes abstract abstract type abstract language abstract title keyword keyword
> classification *Body Content * accession ID (e.g. GenBank ID) appendix of
> an article bold text block that can float (box, quotation, graphic, pull
> quote, sidebar, text) block that is fixed (box, dialogue, graphic,
> poetry, quotation, signature block, text) caption of a figure chemical
> structure (image and possible description and number) computer code
> (block of lines) data for a media resource, such as hex coding or TeX definition
> list (abbreviation list) displayedItem (equation, reaction) email address fixed
> case text feature that can float feature fixed in place fixed italic text field
> in a record figure figure part fixed roman text (text that must stay in
> roman) italic text information asset, such as a chemical name or gene inline
> graphic label (for an irregularly numbered object) letter (such as a
> letter to the editor) line (e.g. of computer code or poetry) lineated
> text (group of lines, possibly numbered) link to another object list
> (various styles) list item list item pair wrapper paired list (such as
> for definitions) paired list column header math statement attribution or
> other detail math statement (theorem, lemma, etc.) mediaResource (binary
> resource, possibly with MIME type etc.) note (footnote, "marginal", or
> assigned to a whole object) end notes paragraph (semantic) laboratory
> protocol protocol materials protocol procedure/recipe protocol section protocol
> step record similar to a database record (with fields) region of an image salutation
> in a letter small caps section source for a figure, table, etc. span (for
> CSS styling, assigning an ID, etc.) subscript sub-article (such as an
> historical article for commentary) superscript tabular content that can
> float tabular content fixed in place term term definition title of a
> section, figure, table, list item, etc. url *References/Bibliographies *(some
> of the other elements above can also be used in citations) article title
> in a citation author in a citation bibliographic item (may be several
> citations) bibliography bibliography section book series title in a
> citation book title in a citation chapter title in a citation citation
> (may also occur inline in main body text) corporate or collaborative
> group name in a citation defendant in a legal citation journal title in a
> citation title in a citation other than book, journal, article, e.g.
> dissertation or online resource plaintiff in a legal citation publisher
> location in a citation publisher name in a citation publication year in a
> citation statute title in a legal citation volume number in a citation
>
>
> -----Original Message-----
> From: Robin Berjon [mailto:robin@berjon.com <robin@berjon.com>]
> Sent: Monday, November 30, 2015 11:25 PM
> To: Johannes Wilm; public-scholarlyhtml@w3.org
> Subject: Re: elements for basic academic articles
>
> Hi Johannes,
>
> thanks for sharing that list, it is useful. I'm just adding some parts
> that we've seen needed below (not necessarily exhaustive). The list we have
> comes from encoding actual articles into SH.
>
> On 25/11/2015 11:51 , Johannes Wilm wrote:
> > **Block level elements for textual contents**
> >
> > - P
> > - H1-H3
> > - Blockquote
> > - Code
> > - UL/OL
>
> Point of terminology: I got tired of saying the many variations on "block
> content", "blocks such as paragraphs and tables", or "blocks but of text
> not, like, sections and stuff". Instead I minted "hunks", which means
> exactly: the blockish things inside sections, that aren't the title.
>
> We've found a need for pretty much arbitrary header depth, not just beyond
> h3 but in cases beyond h6. For that we use h6 with aria-level set to the
> real depth.
>
> Things like code (and images, tables, block equations) we all handle as
> figures (even if without a figcaption, which is fine). Beyond consistently
> making them captionable, this also provides nice common hooks for styling
> (and as a bonus it provides a container inside of which to set up
> horizontal scrolling on small screens, which all of these types can need).
>
> > **Inline text elements**
> >
> > - Links (standard HTML links)
> > - Footnotes (Have to be displayed off to the side or below the text,
> > and need to be able to contain all the things that body elements can
> > contain)
>
> I've listed a few more: http://scholarly.vernacular.io/#inline-elements.
> Of those, the most notable are the ones that enable internationalisation
> (ruby and friends), inline math or code, and simply making it possible to
> hang semantics off an added span.
>
>
> --
> • Robin Berjon - http://berjon.com/ - @robinberjon • http://science.ai/ —
> intelligent science publishing •
>
>
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Received on Wednesday, 2 December 2015 09:39:57 UTC