- From: Rafael C. Jimenez <rafael.jimenez@elixir-europe.org>
- Date: Wed, 24 May 2017 10:15:22 +0100
- To: Justin Clark-Casey <jc955@cam.ac.uk>
- Cc: public-bioschemas@w3.org
- Message-ID: <CABH4q1_urYXgfU2FVd8UfDqnXeCFfMnS7Pbn8hqfLrjZ_gwMhw@mail.gmail.com>
On 16 May 2017 at 15:54, Justin Clark-Casey <jc955@cam.ac.uk> wrote: > Hi all. In advance of the Bioschemas meeting next week, I've hacked up a > very rough implementation of schema.org markup in InterMine [1]. > Specifically, this is in an installation of InterMine called Synbiomine > [2], a data warehouse for synthetic biology that I've been working on. > This compiles information from many sources (EBI, NCBI, etc.) into > integrated biological object reports (genes, proteins, parts, etc.). > > In lieu of of 'proper' Bioschemas structures, I've put in DataCatalog and > Dataset. In fact, I'm abusing Dataset to represent integrated objects > (e.g. protein Q816S6_BACCR) but I wanted to experiment with linking > structures (in this case DataCatalog and Dataset). The front page embeds > the DataCatalog and individual report pages (e.g. [3]) embed Dataset. You > can see the Google Structured Data Testing Tool (GSDTT) analysis of the > front page at [4] and a particular report pages at [5]. > > My top 5 immediate observations: > > * Embedding JSON-LD itself is not hard. yes the technical adoption is simple and a good selling point. > More challenging is interpreting which schema.org properties to use and > how to use them (e.g. CreativeWork.about or Thing.description)? > indeed. I think this is the opportunity and the strong point of Bioschemas. We want to suggest a small and a well defined subset of properties from the more than 100 available for a complex type like dataset. Look forward to taking to you later. This is great feedback for todays workshop. I am curious about how this aligns to the work we have done so far for datasets and data repositories. Regards, Rafa > > * Being able to link DataCatalog and Dataset (via dataset and > includedInDataCatalog attributes) feels like a big win to embed > standardized structure in a website. In my case, however, I have 2m+ > 'datasets' and this may cause issues embedding in a single DataCatalog > structure (in my implementation I've artificially limited this to 500). > This may be due to my abuse of Dataset but the same problem could crop up > in other contexts. > > * Also in linking DataCatalog and Dataset, I am just embedding the Dataset > url in the DataCatalog, for instance, and assuming software will navigate > to the Dataset and extract more information from that page. > > * The GSDTT is essential for checking the markup and having some > implementation for Bioschemas specifications will be very useful. > > * The GSDTT for some reason does not show multiple entries for the same > property (e.g. shows only one citation in [5] even though there are many). > I presume this is just a GSDTT limitation. > > Overall, imo, it feels really nice to embed structured bio information > directly in the website and this could be really valuable if all the markup > is consistent. Tooling here like GSDTT may be a big help. > > [1] http://intermine.org/ > [2] http://beta.synbiomine.org/synbiomine/begin.do > [3] http://beta.synbiomine.org/synbiomine/report.do?id=112968868 > [4] https://search.google.com/structured-data/testing-tool#url= > http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Fbegin.do > [5] https://search.google.com/structured-data/testing-tool#url= > http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Freport.do%3Fid%3D112968868 > > Regards, > > -- > Justin Clark-Casey, Synbiomine/InterMine Developer > http://synbiomine.org > http://twitter.com/justincc > > > -- *Rafael C Jimenez* ELIXIR Chief Technical Officer www.elixir-europe.org ELIXIR Hub, South Building Wellcome Genome Campus Hinxton, Cambridge, CB10 1SD, UK Tel: +44 (0) 1223 49 2574 <%2B44%20%280%29%201223%20492574> E-Mail: rafael.jimenez@elixir-europe.org [image: ELIXIR] <http://www.elixir-europe.org/>
Received on Wednesday, 24 May 2017 09:16:18 UTC