- From: Alf Eaton <eaton.alf@gmail.com>
- Date: Fri, 18 Mar 2016 15:09:03 +0000
- To: Robin Berjon <robin@berjon.com>
- Cc: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
On 18 March 2016 at 14:42, Robin Berjon <robin@berjon.com> wrote: > > A couple of weeks ago I released "dejats" > (https://github.com/scienceai/dejats), a JS tool that converts JATS to HTML. > > It is *not* a converter to Scholarly HTML, but it is meant to enable one. > > The motivate behind dejats (and the coming dedocx) is that existing > conversion tooling and pipelines for intricate formats such as JATS (or > docx, LaTeX, etc.) tend to be very inflexible. They make assumptions > about what should be extracted and drop information on the floor. > Changing their behaviour typically requires reaching into the code > directly, or in the best cases making use of an API as intricate as the > format. > > The theory to replace that is to have a first step that carries out a > conversion from the original format into HTML in the *dumbest and > stupidest* way possible (something which I believe I've done quite > excellently, if I do say so myself). Once you've produced a very dumb > HTML DOM from the source, you pass it successively to a sequence of > small and very simple tools that each gets the same DOM in turn and that > each modify it in a straightforward and well-contained manner — > essentially the Unix philosophy of pipes of small tools applied to an > HTML DOM. One tool might make the title markup right, another will > extract the journal metadata. > > When you want to carry out a conversion, you pipe together the steps you > need. It's easy to share code and reuse the work of others. This works > quite well for formats that have a high degree of variability — like JATS. > > I haven't yet released tools that work with dejats but internally I > already have four: managing the title, handling journal metadata, > handling article metadata, and transforming the sections of an article > in a (hopefully) sane way. It won't be today, but I'd be happy to give > them the clean up they need to start releasing them next week. I like this idea a lot. At PeerJ we use a single transformation[1] to convert JATS to HTML, starting similarly from a default div/span with class=the original element name, but have gradually added special cases where there are more appropriate elements, and then some later transformation on the parsed DOM document. [1] https://github.com/PeerJ/jats-conversion/blob/master/src/data/xsl/jats-to-html.xsl
Received on Friday, 18 March 2016 15:11:21 UTC