Re: JATS (was: Early draft is up)

I think this is a very useful analysis both strategically and technically.

What I am doing certainly stress the JATS model. The intention is to
consume varied JATS from EuropePMC - over a million and turn them into
computable documents. SH will be critical in narrowing the semantics.

So far I have found ca 215 element tags in probably about a thousand
documents. I'm actually working in Java but the code is simple enough to be
easily ported I think. The SH is used as the primary substrate, not least
because it can be displayed and annotated (we are working very closely with
Hypothes.is - and through them - the W3C annotation spec). I expect that
this will make searches rather fuzzy because authors' semantics are. (We
have "Materials", Materials and Methods" , "methodology", "experimental"
etc.). At this stage I am concentrating on precision rather than recall -
we may miss some sections because their labels are unclear . (And I doubt
that we want to come up with a standard mapping of section headings - it
wouldn't be used anyway).

One early output should be a list of actually what JATS  tags are most
commonly used and what linguistic labels are given to them.



On Mon, Mar 21, 2016 at 2:15 PM, Robin Berjon <robin@berjon.com> wrote:

>
> [analysis snipped]


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Received on Monday, 21 March 2016 17:41:56 UTC