W3C home > Mailing lists > Public > public-bioschemas@w3.org > May 2017

Re: Very rough prototype implementation of DataCatalog/Dataset schema.org markup in InterMine

From: Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
Date: Wed, 24 May 2017 08:33:16 +0000
To: "public-bioschemas@w3.org" <public-bioschemas@w3.org>
CC: "Rafael C. Jimenez" <rafael.jimenez@elixir-europe.org>
Message-ID: <DB5PR06MB1734F3D7D65CABB78BB4ED03B6FE0@DB5PR06MB1734.eurprd06.prod.outlook.com>
Hi Justin

Thanks for doing this and pushing forward everyone's understanding.

I know this is really short notice but could you present the highlights of your work this afternoon?

Alasdair J G Gray
Fellow of the Higher Education Academy
Assistant Professor in Computer Science
Herriot-Watt University, Edinburgh


From: Justin Clark-Casey <jc955@cam.ac.uk>
Sent: Tuesday, May 16, 2017 3:54:15 PM
To: public-bioschemas@w3.org
Subject: Very rough prototype implementation of DataCatalog/Dataset schema.org markup in InterMine

Hi all.  In advance of the Bioschemas meeting next week, I've hacked up a very rough implementation of schema.org markup in InterMine [1].  Specifically, this
is in an installation of InterMine called Synbiomine [2], a data warehouse for synthetic biology that I've been working on.  This compiles information from many
sources (EBI, NCBI, etc.) into integrated biological object reports (genes, proteins, parts, etc.).

In lieu of of 'proper' Bioschemas structures, I've put in DataCatalog and Dataset.  In fact, I'm abusing Dataset to represent integrated objects (e.g. protein
Q816S6_BACCR) but I wanted to experiment with linking structures (in this case DataCatalog and Dataset).  The front page embeds the DataCatalog and individual
report pages (e.g. [3]) embed Dataset.  You can see the Google Structured Data Testing Tool (GSDTT) analysis of the front page at [4] and a particular report
pages at [5].

My top 5 immediate observations:

* Embedding JSON-LD itself is not hard.  More challenging is interpreting which schema.org properties to use and how to use them (e.g. CreativeWork.about or

* Being able to link DataCatalog and Dataset (via dataset and includedInDataCatalog attributes) feels like a big win to embed standardized structure in a
website.  In my case, however, I have 2m+ 'datasets' and this may cause issues embedding in a single DataCatalog structure (in my implementation I've
artificially limited this to 500).  This may be due to my abuse of Dataset but the same problem could crop up in other contexts.

* Also in linking DataCatalog and Dataset, I am just embedding the Dataset url in the DataCatalog, for instance, and assuming software will navigate to the
Dataset and extract more information from that page.

* The GSDTT is essential for checking the markup and having some implementation for Bioschemas specifications will be very useful.

* The GSDTT for some reason does not show multiple entries for the same property (e.g. shows only one citation in [5] even though there are many).  I presume
this is just a GSDTT limitation.

Overall, imo, it feels really nice to embed structured bio information directly in the website and this could be really valuable if all the markup is
consistent.  Tooling here like GSDTT may be a big help.

[1] http://intermine.org/
[2] http://beta.synbiomine.org/synbiomine/begin.do
[3] http://beta.synbiomine.org/synbiomine/report.do?id=112968868
[4] https://search.google.com/structured-data/testing-tool#url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Fbegin.do
[5] https://search.google.com/structured-data/testing-tool#url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Freport.do%3Fid%3D112968868


Justin Clark-Casey, Synbiomine/InterMine Developer


Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences.

This email is generated from the Heriot-Watt University Group, which includes:

  1.  Heriot-Watt University, a Scottish charity registered under number SC000278
  2.  Edinburgh Business School a Charity Registered in Scotland, SC026900. Edinburgh Business School is a company limited by guarantee, registered in Scotland with registered number SC173556 and registered office at Heriot-Watt University Finance Office, Riccarton, Currie, Midlothian, EH14 4AS
  3.  Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.
Received on Wednesday, 24 May 2017 08:37:12 UTC

This archive was generated by hypermail 2.3.1 : Monday, 23 October 2017 15:49:09 UTC