W3C home > Mailing lists > Public > public-bioschemas@w3.org > May 2017

Re: Very rough prototype implementation of DataCatalog/Dataset schema.org markup in InterMine

From: Rafael C. Jimenez <rafael.jimenez@elixir-europe.org>
Date: Wed, 24 May 2017 10:15:22 +0100
Message-ID: <CABH4q1_urYXgfU2FVd8UfDqnXeCFfMnS7Pbn8hqfLrjZ_gwMhw@mail.gmail.com>
To: Justin Clark-Casey <jc955@cam.ac.uk>
Cc: public-bioschemas@w3.org
On 16 May 2017 at 15:54, Justin Clark-Casey <jc955@cam.ac.uk> wrote:

> Hi all.  In advance of the Bioschemas meeting next week, I've hacked up a
> very rough implementation of schema.org markup in InterMine [1].
> Specifically, this is in an installation of InterMine called Synbiomine
> [2], a data warehouse for synthetic biology that I've been working on.
> This compiles information from many sources (EBI, NCBI, etc.) into
> integrated biological object reports (genes, proteins, parts, etc.).
> In lieu of of 'proper' Bioschemas structures, I've put in DataCatalog and
> Dataset.  In fact, I'm abusing Dataset to represent integrated objects
> (e.g. protein Q816S6_BACCR) but I wanted to experiment with linking
> structures (in this case DataCatalog and Dataset).  The front page embeds
> the DataCatalog and individual report pages (e.g. [3]) embed Dataset.  You
> can see the Google Structured Data Testing Tool (GSDTT) analysis of the
> front page at [4] and a particular report pages at [5].
> My top 5 immediate observations:
> * Embedding JSON-LD itself is not hard.

yes the technical adoption is simple and a good selling point.

> More challenging is interpreting which schema.org properties to use and
> how to use them (e.g. CreativeWork.about or Thing.description)?

indeed. I think this is the opportunity and the strong point of Bioschemas.
We want to suggest a small and a well defined subset of properties from the
more than 100 available for a complex type like dataset.

Look forward to taking to you later. This is great feedback for todays
workshop. I am curious about how this aligns to the work we have done so
far for datasets and data repositories.


> * Being able to link DataCatalog and Dataset (via dataset and
> includedInDataCatalog attributes) feels like a big win to embed
> standardized structure in a website.  In my case, however, I have 2m+
> 'datasets' and this may cause issues embedding in a single DataCatalog
> structure (in my implementation I've artificially limited this to 500).
> This may be due to my abuse of Dataset but the same problem could crop up
> in other contexts.
> * Also in linking DataCatalog and Dataset, I am just embedding the Dataset
> url in the DataCatalog, for instance, and assuming software will navigate
> to the Dataset and extract more information from that page.
> * The GSDTT is essential for checking the markup and having some
> implementation for Bioschemas specifications will be very useful.
> * The GSDTT for some reason does not show multiple entries for the same
> property (e.g. shows only one citation in [5] even though there are many).
> I presume this is just a GSDTT limitation.
> Overall, imo, it feels really nice to embed structured bio information
> directly in the website and this could be really valuable if all the markup
> is consistent.  Tooling here like GSDTT may be a big help.
> [1] http://intermine.org/
> [2] http://beta.synbiomine.org/synbiomine/begin.do
> [3] http://beta.synbiomine.org/synbiomine/report.do?id=112968868
> [4] https://search.google.com/structured-data/testing-tool#url=
> http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Fbegin.do
> [5] https://search.google.com/structured-data/testing-tool#url=
> http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Freport.do%3Fid%3D112968868
> Regards,
> --
> Justin Clark-Casey, Synbiomine/InterMine Developer
> http://synbiomine.org
> http://twitter.com/justincc


*Rafael C Jimenez*
ELIXIR Chief Technical Officer

ELIXIR Hub, South Building
Wellcome Genome Campus
Hinxton, Cambridge, CB10 1SD, UK
Tel: +44 (0) 1223 49 2574 <%2B44%20%280%29%201223%20492574>
E-Mail: rafael.jimenez@elixir-europe.org [image: ELIXIR]
Received on Wednesday, 24 May 2017 09:16:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:07:56 UTC