- From: Peter Murray-Rust <pm286@cam.ac.uk>
- Date: Tue, 24 Mar 2020 12:54:16 +0000
- To: Sarven Capadisli <info@csarven.ca>
- Cc: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
Received on Tuesday, 24 March 2020 12:54:42 UTC
I've been working with reading JATS-XML and have built a semantic engine (in Java). I now convert horror-formats like PDF and publisherHTML (of which much is non-scholarly - marketing, recommendations, etc.) into a stripped version with JATS tags for metadata. HTML <meta> tags have no structure, are badly defined (there's no home page for Highwire tags, DC is inadequate). So I'm happy with JATS front/body/back/float-groups. The body is effectively HTML and the main problem is it's often unstructured, but it can be hacked. So given the inactivity on this lists I think that subsets of JATS fragments are satisfactory. I helped create the term "scholarlyHTML" in the Panton Arms (IIRC) but I'm not wedded to its future. All my code is public in the AMI system https://github.com/petermr/ami3. P. -- "I always retain copyright in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same". Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
Received on Tuesday, 24 March 2020 12:54:42 UTC