- From: Robin Berjon <robin@berjon.com>
- Date: Mon, 21 Mar 2016 10:15:39 -0400
- To: Gareth Oakes <goakes@gpsl.co>
- Cc: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
Hi Gareth, On 20/03/2016 18:43 , Gareth Oakes wrote: >> A couple of weeks ago I released "dejats" >> (https://github.com/scienceai/dejats), a JS tool that converts JATS >> to HTML. > > Looks like a sensible tool. Sorry this might be getting off topic, > but for this application I’m interested in the technology choice of > Javascript over XSLT, if you are able to elaborate. > > (We find XSLT quite productive for transformations involving XML > inputs) I have no problem with XSLT, I wrote quite a lot of it in a previous life. But I would like our tooling to work both in Node and the browser. XSLT support in the browser can be tricky (at best) and may get yanked out at some point. When you do get support it's v1-only anyway, which makes reusing existing XSLT harder since much of what's out there is at least v2. XSLT support in Node is, if anything, possibly worse. There are options but all of those I've tried either segfault easily or involve a fair bit of manual tweaking to set up (which doesn't play well with deployment). It ended up being simpler to just mimic the subset of the XSLT templating matching algorithm that I needed. The heart of XSLT is pretty simple, it's all the additional niceties that are hard to implement. The core processing model for v1 fit into a single paragraph :) (cf. https://www.w3.org/TR/xslt/#section-Processing-Model) I know that Saxonica recently announced a new XSLT to JS compiler. I'll certainly be looking at it. >> There can be a lot of variability in JATS. There's a reason for >> that: it's meant to be a target format, and as such has to adapt to >> a fair amount of variability in input. This is great to get things >> into, it can make it hard to transform out of. In a way, the >> essential difference between JATS and SH is that SH is also a >> target format but is meant to the *final* step format (such that >> transformation out of it ought not be necessary) and to have its >> metadata extractable through tooling that is largely insensitive to >> structure (RDFa). > > I still think there will be a variability in the amount of richness > that SH articles will be able to provide. Publishers may or may not > have content with a complete or consistent set of semantic > information. Silly things like whether addresses are marked up > properly, surnames/given-names correctly identified, <mixed-citation> > vs <element-citation> use, whether author-supplied references are > checked & corrected, citation styles, etc. Absolutely, and those silly things add up quickly. I think that the way to approach is this: 1) If you have the structured data, then you must encode it like this. (No need for variability when you do have the information.) 2) If you only have the text, well, give us the text. One of the neat things with RDFa is that at the processing level we can have interoperability without having to care too much about structure. The body of the article needs to be relatively regular in terms of sections+hunks, but the rest can be pretty creative and you can still get a nice JSON-LD tree out of it. > Obviously you can force standardisation and a minimal level of > compliance, but that works against acceptance by potential users of > SH. Could/should SH provide one standard that all publishers meet? Is > multi-level compliance like JATS green/blue/orange a consideration? > Or an extra level of conformance like JATS4R? My experience with standards is that multiple levels of conformance create more problems than they solve. (JATS4R is sort of different in that regard in that it is more trying to solve that problem more than adding to it, but that's the general idea at least). My current thinking is that when you process an SH document, you actually get two tree. One is the article tree, that is basically little more than title/hunks/sections with sections containing that recursively. The other is a metadata tree, which is basically the JSON-LD tree rooted at the article resource (in JSON-LD terms, the graph of the article is framed into the article resource). Both trees have identifiers that make it possible (even relatively easy) to merge them back together. What we do is that we store both separately (largely so that the metadata can be edited and enriched on its own) and then we have a React component that just merges them back together. Obviously, if instead of a tree you want an RDF graph nothing prevents. Personally I've found it easier to work with two trees (even though they are not isomorphic), and in fact I never actually work at the RDF level, but YMMV! The core idea is that RDFa allows for a lot of variability without impacting processing, and makes it possible to work with whatever tool chain you like. -- • Robin Berjon - http://berjon.com/ - @robinberjon • http://science.ai/ — intelligent science publishing •
Received on Monday, 21 March 2016 14:16:04 UTC