- From: Pedersen, John - Hoboken <jpederse@wiley.com>
- Date: Wed, 2 Dec 2015 02:32:36 +0000
- To: Robin Berjon <robin@berjon.com>, Johannes Wilm <johanneswilm@vivliostyle.com>, "public-scholarlyhtml@w3.org" <public-scholarlyhtml@w3.org>
- Message-ID: <e11f836e4ea742f6ae8ddfec54abbd10@CAR-WNMBP-009.wiley.com>
One thought here is that the HTML elements/attributes for scholarly content should come from how best to capture in HTML5 the concepts/information/structure that scholarly/academic articles contain. That is, rather than jumping to the HTML5, first enumerate the concepts in some language-agnostic way and then see what HTML5 best fits. The suggestions so far, both Johannes' below and the entire set of SH and friends candidate languages, are likely intending to provide already the benefit of having gone through this exercise, but since there's apparently several different conclusions, maybe it would be worth going through the analysis explicitly again? There's no shortage of material to draw on, given that there have been years (decades really) of defining the concepts important to research/scholarly articles. These are embodied in the many DTDs and other schema both public and proprietary that publishers and consumers of such content have defined. I'm thinking here of everything starting from Majour headers through commercial publishers' DTDs to PubMed and JATS. If the efforts that have been listed already have done this analysis, can that be shared here? Our own "WileyML" [1] intends to capture all of the concepts that Wiley has found necessary so far for its academic/journal content (but we are in flux, anticipating the future). These now extend beyond academic/scholarly articles, but there’s a pared-down list below that I’ve tried to restrict to concepts relevant to scholarly/academic articles. As usual the devil is in the details, with a prime example being even something as fundamental as paragraphs. The reality is that HTML's <p>, even in HTML5, does not fully capture the semantic notion of "paragraph" since that can for example contain displayed equations (it's relatively common for an equation to be followed by “where x is….”, clearly part of the same paragraph, although that’s not the only case). However the proper HTML for displayed objects such as equations involves a <div>, which cannot be within <p>. And of course it's not just a matter of specifying which elements/atts/values may be needed, but also structuring and additional rules that may be appropriate, but the list could be a good start. Is it worth us filling this out to agree on all the concepts we want to capture for scholarly/academic articles and then specifying the best HTML5 construct for each? (no doubt the answer for many is <span> with some attribute(s)). We could also add restrictions/structuring. RELAX NG and Schematron anyone? :) John Pedersen Director, Content Architecture [1] http://vendors.wiley.com/schemas/wileyml3g/ Scholarly/Academic Concepts, not including OASIS tables and MathML Metadata HTML /structure/constraints journal level DOI for journal issn (print) issn (electronic) id (journal) title (of journal) abbreviated title (of journal) subject (of journal) issue level position in volume DOI for issue title (issue/supplement) copyright owner (issue) copyright line (issue) volume number issue number supplement number editor (for special issue) date issue started date issue completed cover date cover date (display form) article level article type article status position in issue e-locator page total word count access type (open?) ToC heading for article 1st level ToC heading for article 2nd level ToC heading for article 3rd level MedLine PubType MeSH checkword MeSH descriptor MeSH descriptor major topic? MeSH descriptor tree number MeSH descriptor unique ID MeSH qualifier MeSH qualifier major topic MeSH qualifier tree number MeSH qualifier unique ID link to typeset version link to typeset version first page link to plain text version link to author manuscript version embargo end date for author manuscript title - ToC form title - short (running) title - short authors(running) erratum target DOI retraction target DOI subject (article level) subject relevance editorial office ID file ID society ID supplier ID title (main article title) subtitle (article) article category title pageHeading title first page last page article copyright doi (article) online pub date creator creator role affiliation link current affiliation link ORCID honorifics given names name prefix (van der) family name name suffix (Jr.) degrees titles after names preferred display name alternative name job title biographical info biographical photo email (for creator) website/url phone fax manuscript received date manuscript revised date manuscript accepted date funding agency funding grant number funder DOI Fundref name dedication license (legalStatement) supporting information corresponding author info affiliation country code orgDiv orgName address street city postcode country part country header footnotes abstract abstract type abstract language abstract title keyword keyword classification Body Content accession ID (e.g. GenBank ID) appendix of an article bold text block that can float (box, quotation, graphic, pull quote, sidebar, text) block that is fixed (box, dialogue, graphic, poetry, quotation, signature block, text) caption of a figure chemical structure (image and possible description and number) computer code (block of lines) data for a media resource, such as hex coding or TeX definition list (abbreviation list) displayedItem (equation, reaction) email address fixed case text feature that can float feature fixed in place fixed italic text field in a record figure figure part fixed roman text (text that must stay in roman) italic text information asset, such as a chemical name or gene inline graphic label (for an irregularly numbered object) letter (such as a letter to the editor) line (e.g. of computer code or poetry) lineated text (group of lines, possibly numbered) link to another object list (various styles) list item list item pair wrapper paired list (such as for definitions) paired list column header math statement attribution or other detail math statement (theorem, lemma, etc.) mediaResource (binary resource, possibly with MIME type etc.) note (footnote, "marginal", or assigned to a whole object) end notes paragraph (semantic) laboratory protocol protocol materials protocol procedure/recipe protocol section protocol step record similar to a database record (with fields) region of an image salutation in a letter small caps section source for a figure, table, etc. span (for CSS styling, assigning an ID, etc.) subscript sub-article (such as an historical article for commentary) superscript tabular content that can float tabular content fixed in place term term definition title of a section, figure, table, list item, etc. url References/Bibliographies (some of the other elements above can also be used in citations) article title in a citation author in a citation bibliographic item (may be several citations) bibliography bibliography section book series title in a citation book title in a citation chapter title in a citation citation (may also occur inline in main body text) corporate or collaborative group name in a citation defendant in a legal citation journal title in a citation title in a citation other than book, journal, article, e.g. dissertation or online resource plaintiff in a legal citation publisher location in a citation publisher name in a citation publication year in a citation statute title in a legal citation volume number in a citation -----Original Message----- From: Robin Berjon [mailto:robin@berjon.com] Sent: Monday, November 30, 2015 11:25 PM To: Johannes Wilm; public-scholarlyhtml@w3.org Subject: Re: elements for basic academic articles Hi Johannes, thanks for sharing that list, it is useful. I'm just adding some parts that we've seen needed below (not necessarily exhaustive). The list we have comes from encoding actual articles into SH. On 25/11/2015 11:51 , Johannes Wilm wrote: > **Block level elements for textual contents** > > - P > - H1-H3 > - Blockquote > - Code > - UL/OL Point of terminology: I got tired of saying the many variations on "block content", "blocks such as paragraphs and tables", or "blocks but of text not, like, sections and stuff". Instead I minted "hunks", which means exactly: the blockish things inside sections, that aren't the title. We've found a need for pretty much arbitrary header depth, not just beyond h3 but in cases beyond h6. For that we use h6 with aria-level set to the real depth. Things like code (and images, tables, block equations) we all handle as figures (even if without a figcaption, which is fine). Beyond consistently making them captionable, this also provides nice common hooks for styling (and as a bonus it provides a container inside of which to set up horizontal scrolling on small screens, which all of these types can need). > **Inline text elements** > > - Links (standard HTML links) > - Footnotes (Have to be displayed off to the side or below the text, > and need to be able to contain all the things that body elements can > contain) I've listed a few more: http://scholarly.vernacular.io/#inline-elements. Of those, the most notable are the ones that enable internationalisation (ruby and friends), inline math or code, and simply making it possible to hang semantics off an added span. -- • Robin Berjon - http://berjon.com/ - @robinberjon • http://science.ai/ — intelligent science publishing •
Attachments
- image/jpeg attachment: ATT53275_1.jpg
Received on Wednesday, 2 December 2015 02:33:09 UTC