Re: Early draft is up from Peter Murray-Rust on 2016-03-18 (public-scholarlyhtml@w3.org from March 2016)

From: Peter Murray-Rust <pm286@cam.ac.uk>
Date: Fri, 18 Mar 2016 13:09:53 +0000
To: Gareth Oakes <goakes@gpsl.co>
Cc: Robin Berjon <robin@berjon.com>, W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
Message-ID: <CAD2k14NJcAy-vE68x9d-=ZunweC2hDK2VK4jgC5v3QVaEbVs8A@mail.gmail.com>
Thanks Gareth - agree with all of that.

The NLM has put out stylesheets for this (jats-html.xsl) with the banner
below

The reason I think it will be useful is it will probably test:
* encodings
* high codepoints (especially math and non ISO-LATIN languages)
* have variation in how concepts are encoded (e.g. what is an author? a
publishers, a journal, etc.

<?xml version="1.0"?>

<!-- ============================================================= -->

<!--  MODULE:    HTML Preview of NISO JATS Publishing 1.0 XML      -->

<!--  DATE:      May-June 2012                                     -->

<!--                                                               -->

<!-- ============================================================= -->


<!-- ============================================================= -->

<!--  SYSTEM:    NCBI Archiving and Interchange Journal Articles   -->

<!--                                                               -->

<!--  PURPOSE:   Provide an HTML preview of a journal article,     -->

<!--             in a form suitable for reading.                   -->

<!--                                                               -->

<!--  PROCESSOR DEPENDENCIES:                                      -->

<!--             None: standard XSLT 1.0                           -->

<!--             Tested using Saxon 6.5, Tranformiix (Firefox),    -->

<!--               Saxon 9.4.0.3                                   -->

<!--                                                               -->

<!--  COMPONENTS REQUIRED:                                         -->

<!--             1) This stylesheet                                -->

<!--             2) CSS styles defined in jats-preview.css         -->

<!--                (to be placed with the results)                -->

<!--                                                               -->

<!--  INPUT:     An XML document valid to (any of) the             -->

<!--             NISO JATS 1.0, NLM/NCBI Journal Publishing 3.0,   -->

<!--             or NLM/NCBI Journal Publishing 2.3 DTDs.          -->

<!--             (May also work with older variants,               -->

<!--             and note further assumptions and limitations      -->

<!--             below.)                                           -->

<!--                                                               -->

<!--  OUTPUT:    HTML (XHTML if a postprocessor is used)           -->

<!--                                                               -->

<!--  CREATED FOR:                                                 -->

<!--             Digital Archive of Journal Articles               -->

<!--             National Center for Biotechnology Information (NCBI)
-->

<!--             National Library of Medicine (NLM)                -->

<!--                                                               -->

<!--  CREATED BY:                                                  -->

<!--             Wendell Piez (based on HTML design by             -->

<!--             Kate Hamilton and Debbie Lapeyre, 2004),          -->

<!--             Mulberry Technologies, Inc.                       -->

<!--                                                               -->

<!-- ============================================================= -->


<!-- ============================================================= -->

<!--

  This work is in the public domain and may be reproduced, published or

  otherwise used without the permission of the National Library of Medicine
(NLM).



On Fri, Mar 18, 2016 at 1:22 AM, Gareth Oakes <goakes@gpsl.co> wrote:

> Hi there,
>
> Yes it’s great to see a stake in the ground for this. It will take a
> little time to absorb.
>
> Firstly I agree that it would be very helpful to build a validation rule
> set for this, as an expression of the boundaries of ScholarlyHTML. I know
> it’s a bit early to have it all nailed down at this stage, and would
> certainly take concerted effort.
>
> Just some food for thought regarding the transform, and assuming it is
> going to be made available for others to use… There is definitely a wide
> degree of variability in how JATS is used “in the field”. Not just the
> version differences between NLM, 1.0, 1.1; publishers often have tagging
> convention documents to describe their specific usage of JATS. I guess a
> generic JATS transform would need to be designed in such a way as to allow
> publisher-, journal- or even article-specific rules to apply?
>
> (Interesting tangential thought – publishers may also need some
> variability when expressing their content semantics in the ScholarlyHTML?)
>
> Back on the transform topic, here is an example from a BITS transform we
> did using similar concepts:
>                      a2b                     (main stylesheet)
>                       |
>                      root                    (core functionality)
>     __________________|__________________
>     |           |           |           |
> createCovers chapter   frontmatter     toc   (book-part types)
>     |___________|___________|___________|
>          _____________|______________
>          |       |         |        |
>        text    lists    tables   figures     (basic formatting)
>
>          |_______|_________|________|
>                       :
>                 _ _ _ : _ _ _
>                 :           :
>                ABC         XYZ               (stylesheet overrides)
>
> We implemented the ABC..XYZ overrides by importing a book-specific set of
> XSL file(s) that provided either named template “methods” or, for more
> difficult cases, direct <xsl:template> overrides. The design pattern was to
> implement a generic transformation but call a named template any time
> specific functionality was required. The named templates don’t
> fundamentally change the transformation, but may produce minor differences
> to ordering, styling, generated text, etc.
>
> We have done something similar for an ISOSTS transform too, and it seems
> to work pretty well.
>
> // Gareth Oakes
> // Chief Architect, GPSL
> // www.gpsl.co
>
> From: <peter.murray.rust@googlemail.com> on behalf of Peter Murray-Rust <
> pm286@cam.ac.uk>
> Date: Friday, 18 March 2016 at 10:57
> To: Robin Berjon <robin@berjon.com>
> Cc: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>, "
> all@contentmine.org" <all@contentmine.org>
> Subject: Re: Early draft is up
> Resent-From: <public-scholarlyhtml@w3.org>
> Resent-Date: Friday, 18 March 2016 at 10:58
>
> Excellent!
>
> For the record I am currently working on converting JATS-XML into
> ScholarlyHTML. This is the primary resource that we use for mining science.
> At present it's just XHTML which is well formed and with some degree of
> normalization, but there is some variation in the publishers' markup.
>
> Do we plan to have a validator for ScholarlyHTML?
>
> Anyway I'll let you know how we get on and feed early drafts of converted
> SHTML - which will almost certainly have serious errors...
>
> There are over 1 million documents to practice on!
>
> P.
>
>
> On Thu, Mar 17, 2016 at 9:04 PM, Robin Berjon <robin@berjon.com> wrote:
>
>> Hi all,
>>
>> after a period of dormancy (it turns out that actually implementing
>> stuff is work), Tzviya and I sat down this week to put together an early
>> draft of the spec. You can see it up at:
>>
>>     http://w3c.github.io/scholarly-html/
>>
>> When I say early I do mean it. A lot of the concepts are likely there,
>> but many things are still missing from the more trivial (nicer CSS) to
>> more complex. The spec might need some restructuring in places we're
>> still thinking about that.
>>
>> But hopefully there is enough meat that we can use this as a starting
>> point for discussion. The GitHub issues are open, there's this list, we
>> take PRs, etc.
>>
>> We hope you'll enjoy it!
>>
>> --
>> • Robin Berjon - http://berjon.com/ - @robinberjon
>> • http://science.ai/ — intelligent science publishing
>> •
>>
>>
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Received on Friday, 18 March 2016 13:10:21 UTC