- From: Gareth Oakes <goakes@gpsl.co>
- Date: Mon, 21 Mar 2016 22:34:21 +0000
- To: Peter Murray-Rust <pm286@cam.ac.uk>
- CC: W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
- Message-ID: <65E179D3-21A3-4E4B-965A-BAB5CDFF7222@gpsl.co>
Hi Peter, > What I am doing certainly stress the JATS model. The intention is to consume varied JATS from EuropePMC - > over a million and turn them into computable documents. SH will be critical in narrowing the semantics. Very cool. I also think SH has a role to play in being able to expand the machine-accessible knowledge base beyond the full text articles and into supplementary materials, research results, etc. > I expect that this will make searches rather fuzzy because authors' semantics are. > (We have "Materials", Materials and Methods" , "methodology", "experimental" etc.). Yes you have to draw the line at what level of content “intelligence” you wish to serve up. For example you can deliver content that is identified to a section level, but if your retrieval API is developed cleverly enough, it could be used to drive a machine learning system which can be trained to recognise and return results that are relevant to the particular user or their query. I guess what I’m trying to say is that it’s clear that SH can’t be used to model a complete, cohesive, semantic database of scholarly content. However the promise is that we will be able to get much closer than we are today. (Side note: are there analogies between SH and the goals of standards like DITA/S1000D to deliver the notion of an interoperable “content supply chain”?) > One early output should be a list of actually what JATS tags are most commonly used and what linguistic labels are given to them. Possibly venturing into NLP and machine learning territory? // Gareth Oakes // Chief Architect, GPSL // www.gpsl.co
Received on Monday, 21 March 2016 22:34:52 UTC