- From: J. Clark-Casey <jc955@cam.ac.uk>
- Date: Wed, 17 May 2017 07:16:40 +0100
- To: public-bioschemas@w3.org
To acknowledge, this bit of prototyping work was done under Wellcome Trust support (for InterMine) and EPSRC/Flowers Consortium support (for Synbiomine). Just as a quick additional note, many of the citation URLs in the Dataset structure are currently null only because it was a bit difficult for me to get at all the info in the quick prototype. However, this does seem a very good way of embedding standardized structured attribution (and perhaps later citation) information. -------- Original Message -------- Subject: Very rough prototype implementation of DataCatalog/Dataset schema.org markup in InterMine Date: 2017-05-16 15:54 From: Justin Clark-Casey <jc955@cam.ac.uk> To: public-bioschemas@w3.org Hi all. In advance of the Bioschemas meeting next week, I've hacked up a very rough implementation of schema.org markup in InterMine [1]. Specifically, this is in an installation of InterMine called Synbiomine [2], a data warehouse for synthetic biology that I've been working on. This compiles information from many sources (EBI, NCBI, etc.) into integrated biological object reports (genes, proteins, parts, etc.). In lieu of of 'proper' Bioschemas structures, I've put in DataCatalog and Dataset. In fact, I'm abusing Dataset to represent integrated objects (e.g. protein Q816S6_BACCR) but I wanted to experiment with linking structures (in this case DataCatalog and Dataset). The front page embeds the DataCatalog and individual report pages (e.g. [3]) embed Dataset. You can see the Google Structured Data Testing Tool (GSDTT) analysis of the front page at [4] and a particular report pages at [5]. My top 5 immediate observations: * Embedding JSON-LD itself is not hard. More challenging is interpreting which schema.org properties to use and how to use them (e.g. CreativeWork.about or Thing.description)? * Being able to link DataCatalog and Dataset (via dataset and includedInDataCatalog attributes) feels like a big win to embed standardized structure in a website. In my case, however, I have 2m+ 'datasets' and this may cause issues embedding in a single DataCatalog structure (in my implementation I've artificially limited this to 500). This may be due to my abuse of Dataset but the same problem could crop up in other contexts. * Also in linking DataCatalog and Dataset, I am just embedding the Dataset url in the DataCatalog, for instance, and assuming software will navigate to the Dataset and extract more information from that page. * The GSDTT is essential for checking the markup and having some implementation for Bioschemas specifications will be very useful. * The GSDTT for some reason does not show multiple entries for the same property (e.g. shows only one citation in [5] even though there are many). I presume this is just a GSDTT limitation. Overall, imo, it feels really nice to embed structured bio information directly in the website and this could be really valuable if all the markup is consistent. Tooling here like GSDTT may be a big help. [1] http://intermine.org/ [2] http://beta.synbiomine.org/synbiomine/begin.do [3] http://beta.synbiomine.org/synbiomine/report.do?id=112968868 [4] https://search.google.com/structured-data/testing-tool#url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Fbegin.do [5] https://search.google.com/structured-data/testing-tool#url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Freport.do%3Fid%3D112968868 Regards, -- Justin Clark-Casey, Synbiomine/InterMine Developer http://synbiomine.org http://twitter.com/justincc
Received on Wednesday, 17 May 2017 06:17:11 UTC