Re: Early draft is up from Gareth Oakes on 2016-03-18 (public-scholarlyhtml@w3.org from March 2016)

From: Gareth Oakes <goakes@gpsl.co>
Date: Fri, 18 Mar 2016 01:22:43 +0000
To: Peter Murray-Rust <pm286@cam.ac.uk>, Robin Berjon <robin@berjon.com>
CC: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
Message-ID: <40F0BC87-33BE-488C-848F-280F045E5878@gpsl.co>
Hi there,

Yes it’s great to see a stake in the ground for this. It will take a little time to absorb.

Firstly I agree that it would be very helpful to build a validation rule set for this, as an expression of the boundaries of ScholarlyHTML. I know it’s a bit early to have it all nailed down at this stage, and would certainly take concerted effort.

Just some food for thought regarding the transform, and assuming it is going to be made available for others to use… There is definitely a wide degree of variability in how JATS is used “in the field”. Not just the version differences between NLM, 1.0, 1.1; publishers often have tagging convention documents to describe their specific usage of JATS. I guess a generic JATS transform would need to be designed in such a way as to allow publisher-, journal- or even article-specific rules to apply?

(Interesting tangential thought – publishers may also need some variability when expressing their content semantics in the ScholarlyHTML?)

Back on the transform topic, here is an example from a BITS transform we did using similar concepts:
                     a2b                     (main stylesheet)
                      |
                     root                    (core functionality)
    __________________|__________________
    |           |           |           |
createCovers chapter   frontmatter     toc   (book-part types)
    |___________|___________|___________|
         _____________|______________
         |       |         |        |
       text    lists    tables   figures     (basic formatting)
         |_______|_________|________|
                      :
                _ _ _ : _ _ _
                :           :
               ABC         XYZ               (stylesheet overrides)

We implemented the ABC..XYZ overrides by importing a book-specific set of XSL file(s) that provided either named template “methods” or, for more difficult cases, direct <xsl:template> overrides. The design pattern was to implement a generic transformation but call a named template any time specific functionality was required. The named templates don’t fundamentally change the transformation, but may produce minor differences to ordering, styling, generated text, etc.

We have done something similar for an ISOSTS transform too, and it seems to work pretty well.

// Gareth Oakes
// Chief Architect, GPSL
// www.gpsl.co

From: <peter.murray.rust@googlemail.com<mailto:peter.murray.rust@googlemail.com>> on behalf of Peter Murray-Rust <pm286@cam.ac.uk<mailto:pm286@cam.ac.uk>>
Date: Friday, 18 March 2016 at 10:57
To: Robin Berjon <robin@berjon.com<mailto:robin@berjon.com>>
Cc: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org<mailto:public-scholarlyhtml@w3.org>>, "all@contentmine.org<mailto:all@contentmine.org>" <all@contentmine.org<mailto:all@contentmine.org>>
Subject: Re: Early draft is up
Resent-From: <public-scholarlyhtml@w3.org<mailto:public-scholarlyhtml@w3.org>>
Resent-Date: Friday, 18 March 2016 at 10:58

Excellent!

For the record I am currently working on converting JATS-XML into ScholarlyHTML. This is the primary resource that we use for mining science. At present it's just XHTML which is well formed and with some degree of normalization, but there is some variation in the publishers' markup.

Do we plan to have a validator for ScholarlyHTML?

Anyway I'll let you know how we get on and feed early drafts of converted SHTML - which will almost certainly have serious errors...

There are over 1 million documents to practice on!

P.


On Thu, Mar 17, 2016 at 9:04 PM, Robin Berjon <robin@berjon.com<mailto:robin@berjon.com>> wrote:
Hi all,

after a period of dormancy (it turns out that actually implementing
stuff is work), Tzviya and I sat down this week to put together an early
draft of the spec. You can see it up at:

    http://w3c.github.io/scholarly-html/


When I say early I do mean it. A lot of the concepts are likely there,
but many things are still missing from the more trivial (nicer CSS) to
more complex. The spec might need some restructuring in places we're
still thinking about that.

But hopefully there is enough meat that we can use this as a starting
point for discussion. The GitHub issues are open, there's this list, we
take PRs, etc.

We hope you'll enjoy it!

--
• Robin Berjon - http://berjon.com/ - @robinberjon
• http://science.ai/ — intelligent science publishing
•




--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Received on Friday, 18 March 2016 10:25:03 UTC