- From: Peter Murray-Rust <pm286@cam.ac.uk>
- Date: Wed, 2 Dec 2015 09:39:23 +0000
- To: "Pedersen, John - Hoboken" <jpederse@wiley.com>
- Cc: Robin Berjon <robin@berjon.com>, Johannes Wilm <johanneswilm@vivliostyle.com>, "public-scholarlyhtml@w3.org" <public-scholarlyhtml@w3.org>
- Message-ID: <CAD2k14OPPwshnvEBYoR56bLwZ_G94=KVY+3vMfTbYHAReFm5+Q@mail.gmail.com>
I think JohnP's list is a good starting point. At EuropePMC we have a list of about 20 tags for sections/divd in the article http://europepmc.org/ftp <http://europepmc.org/ftp/>/oa <http://europepmc.org/ftp/oa/>/SectionTagger <http://europepmc.org/ftp/oa/SectionTagger/>/ . The can be used for retrospective markup of articles so include regexes for identifying sections:, e.g. https://github.com/ScholarlyHTML/spec/blob/master/sectiontags.md I would find it useful to have one or more examples of articles actually marked up in a proto-SH so that we can get a feel for what an average document would look like. We'll probably be gradually assembling some in ContentMine as we read the current literature. On Wed, Dec 2, 2015 at 2:32 AM, Pedersen, John - Hoboken <jpederse@wiley.com > wrote: > One thought here is that the HTML elements/attributes for scholarly > content should come from how best to capture in HTML5 the > concepts/information/structure that scholarly/academic articles contain. > That is, rather than jumping to the HTML5, first enumerate the concepts in > some language-agnostic way and then see what HTML5 best fits. The > suggestions so far, both Johannes' below and the entire set of SH and > friends candidate languages, are likely intending to provide already the > benefit of having gone through this exercise, but since there's apparently > several different conclusions, maybe it would be worth going through the > analysis explicitly again? > > There's no shortage of material to draw on, given that there have been > years (decades really) of defining the concepts important to > research/scholarly articles. These are embodied in the many DTDs and other > schema both public and proprietary that publishers and consumers of such > content have defined. I'm thinking here of everything starting from Majour > headers through commercial publishers' DTDs to PubMed and JATS. If the > efforts that have been listed already have done this analysis, can that be > shared here? Our own "WileyML" [1] intends to capture all of the concepts > that Wiley has found necessary so far for its academic/journal content (but > we are in flux, anticipating the future). These now extend beyond > academic/scholarly articles, but there’s a pared-down list below that I’ve > tried to restrict to concepts relevant to scholarly/academic articles. > > As usual the devil is in the details, with a prime example being even > something as fundamental as paragraphs. The reality is that HTML's <p>, > even in HTML5, does not fully capture the semantic notion of "paragraph" > since that can for example contain displayed equations (it's relatively > common for an equation to be followed by “where x is….”, clearly part of > the same paragraph, although that’s not the only case). However the proper > HTML for displayed objects such as equations involves a <div>, which cannot > be within <p>. > > And of course it's not just a matter of specifying which > elements/atts/values may be needed, but also structuring and additional > rules that may be appropriate, but the list could be a good start. Is it > worth us filling this out to agree on all the concepts we want to capture > for scholarly/academic articles and then specifying the best HTML5 > construct for each? (no doubt the answer for many is <span> with some > attribute(s)). We could also add restrictions/structuring. RELAX NG and > Schematron anyone? :) > > John Pedersen > Director, Content Architecture > > > [1] *http://vendors.wiley.com/schemas/wileyml3g/* > <http://vendors.wiley.com/schemas/wileyml3g/> > > > *Scholarly/Academic Concepts, not including OASIS tables and MathML* > *Metadata* *HTML /structure/constraints* *journal level* DOI for journal issn > (print) issn (electronic) id (journal) title (of journal) abbreviated > title (of journal) subject (of journal) *issue level* position in volume DOI > for issue title (issue/supplement) copyright owner (issue) copyright line > (issue) volume number issue number supplement number editor (for special > issue) date issue started date issue completed cover date cover date > (display form) *article level* article type article status position in > issue e-locator page total word count access type (open?) ToC heading for > article 1st level ToC heading for article 2nd level ToC heading for > article 3rd level MedLine PubType MeSH checkword MeSH descriptor MeSH > descriptor major topic? MeSH descriptor tree number MeSH descriptor > unique ID MeSH qualifier MeSH qualifier major topic MeSH qualifier tree > number MeSH qualifier unique ID link to typeset version link to typeset > version first page link to plain text version link to author manuscript > version embargo end date for author manuscript title - ToC form title - > short (running) title - short authors(running) erratum target DOI retraction > target DOI subject (article level) subject relevance editorial office ID file > ID society ID supplier ID title (main article title) subtitle (article) article > category title pageHeading title first page last page article copyright doi > (article) online pub date creator creator role > affiliation link current affiliation link ORCID > honorifics given names name prefix (van der) > family name name suffix (Jr.) degrees titles > after names preferred display name alternative name > job title biographical info biographical photo email > (for creator) website/url phone fax manuscript > received date manuscript revised date manuscript accepted date funding > agency funding grant number funder DOI Fundref name dedication license > (legalStatement) supporting information corresponding author info > affiliation country code orgDiv orgName > address street city > postcode country part country header > footnotes abstract abstract type abstract language abstract title keyword keyword > classification *Body Content * accession ID (e.g. GenBank ID) appendix of > an article bold text block that can float (box, quotation, graphic, pull > quote, sidebar, text) block that is fixed (box, dialogue, graphic, > poetry, quotation, signature block, text) caption of a figure chemical > structure (image and possible description and number) computer code > (block of lines) data for a media resource, such as hex coding or TeX definition > list (abbreviation list) displayedItem (equation, reaction) email address fixed > case text feature that can float feature fixed in place fixed italic text field > in a record figure figure part fixed roman text (text that must stay in > roman) italic text information asset, such as a chemical name or gene inline > graphic label (for an irregularly numbered object) letter (such as a > letter to the editor) line (e.g. of computer code or poetry) lineated > text (group of lines, possibly numbered) link to another object list > (various styles) list item list item pair wrapper paired list (such as > for definitions) paired list column header math statement attribution or > other detail math statement (theorem, lemma, etc.) mediaResource (binary > resource, possibly with MIME type etc.) note (footnote, "marginal", or > assigned to a whole object) end notes paragraph (semantic) laboratory > protocol protocol materials protocol procedure/recipe protocol section protocol > step record similar to a database record (with fields) region of an image salutation > in a letter small caps section source for a figure, table, etc. span (for > CSS styling, assigning an ID, etc.) subscript sub-article (such as an > historical article for commentary) superscript tabular content that can > float tabular content fixed in place term term definition title of a > section, figure, table, list item, etc. url *References/Bibliographies *(some > of the other elements above can also be used in citations) article title > in a citation author in a citation bibliographic item (may be several > citations) bibliography bibliography section book series title in a > citation book title in a citation chapter title in a citation citation > (may also occur inline in main body text) corporate or collaborative > group name in a citation defendant in a legal citation journal title in a > citation title in a citation other than book, journal, article, e.g. > dissertation or online resource plaintiff in a legal citation publisher > location in a citation publisher name in a citation publication year in a > citation statute title in a legal citation volume number in a citation > > > -----Original Message----- > From: Robin Berjon [mailto:robin@berjon.com <robin@berjon.com>] > Sent: Monday, November 30, 2015 11:25 PM > To: Johannes Wilm; public-scholarlyhtml@w3.org > Subject: Re: elements for basic academic articles > > Hi Johannes, > > thanks for sharing that list, it is useful. I'm just adding some parts > that we've seen needed below (not necessarily exhaustive). The list we have > comes from encoding actual articles into SH. > > On 25/11/2015 11:51 , Johannes Wilm wrote: > > **Block level elements for textual contents** > > > > - P > > - H1-H3 > > - Blockquote > > - Code > > - UL/OL > > Point of terminology: I got tired of saying the many variations on "block > content", "blocks such as paragraphs and tables", or "blocks but of text > not, like, sections and stuff". Instead I minted "hunks", which means > exactly: the blockish things inside sections, that aren't the title. > > We've found a need for pretty much arbitrary header depth, not just beyond > h3 but in cases beyond h6. For that we use h6 with aria-level set to the > real depth. > > Things like code (and images, tables, block equations) we all handle as > figures (even if without a figcaption, which is fine). Beyond consistently > making them captionable, this also provides nice common hooks for styling > (and as a bonus it provides a container inside of which to set up > horizontal scrolling on small screens, which all of these types can need). > > > **Inline text elements** > > > > - Links (standard HTML links) > > - Footnotes (Have to be displayed off to the side or below the text, > > and need to be able to contain all the things that body elements can > > contain) > > I've listed a few more: http://scholarly.vernacular.io/#inline-elements. > Of those, the most notable are the ones that enable internationalisation > (ruby and friends), inline math or code, and simply making it possible to > hang semantics off an added span. > > > -- > • Robin Berjon - http://berjon.com/ - @robinberjon • http://science.ai/ — > intelligent science publishing • > > > -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
Attachments
- image/jpeg attachment: ATT53275_1.jpg
Received on Wednesday, 2 December 2015 09:39:57 UTC