Re: Very rough prototype implementation of DataCatalog/Dataset schema.org markup in InterMine from Leyla Garcia on 2017-05-24 (public-bioschemas@w3.org from May 2017)

From: Leyla Garcia <ljgarcia@ebi.ac.uk>
Date: Wed, 24 May 2017 10:42:00 +0100
To: public-bioschemas@w3.org
Message-ID: <4f0e53fb-2fc9-4f52-4dd3-5d98241de6ae@ebi.ac.uk>
Hi all,

Thanks for sharing!

In case you wonder and have some time before the meeting, our 
presentation, P6 Proteins, is already available at 
https://drive.google.com/drive/folders/0Bw_p-HKWUjHoSnJReF9CV1pLVVk.

Do not forget to upload yours!

Cheers,

On 24/05/2017 10:15, Rafael C. Jimenez wrote:
>
>
> On 16 May 2017 at 15:54, Justin Clark-Casey <jc955@cam.ac.uk 
> <mailto:jc955@cam.ac.uk>> wrote:
>
>     Hi all. In advance of the Bioschemas meeting next week, I've
>     hacked up a very rough implementation of schema.org
>     <http://schema.org> markup in InterMine [1].  Specifically, this
>     is in an installation of InterMine called Synbiomine [2], a data
>     warehouse for synthetic biology that I've been working on.  This
>     compiles information from many sources (EBI, NCBI, etc.) into
>     integrated biological object reports (genes, proteins, parts, etc.).
>
>     In lieu of of 'proper' Bioschemas structures, I've put in
>     DataCatalog and Dataset.  In fact, I'm abusing Dataset to
>     represent integrated objects (e.g. protein Q816S6_BACCR) but I
>     wanted to experiment with linking structures (in this case
>     DataCatalog and Dataset).  The front page embeds the DataCatalog
>     and individual report pages (e.g. [3]) embed Dataset.  You can see
>     the Google Structured Data Testing Tool (GSDTT) analysis of the
>     front page at [4] and a particular report pages at [5].
>
>     My top 5 immediate observations:
>
>     * Embedding JSON-LD itself is not hard. 
>
>
> yes the technical adoption is simple and a good selling point.
>
>     More challenging is interpreting which schema.org
>     <http://schema.org> properties to use and how to use them (e.g.
>     CreativeWork.about or Thing.description)?
>
>
> indeed. I think this is the opportunity and the strong point of 
> Bioschemas. We want to suggest a small and a well defined subset of 
> properties from the more than 100 available for a complex type like 
> dataset.
>
> Look forward to taking to you later. This is great feedback for todays 
> workshop. I am curious about how this aligns to the work we have done 
> so far for datasets and data repositories.
>
> Regards,
> Rafa
>
>
>     * Being able to link DataCatalog and Dataset (via dataset and
>     includedInDataCatalog attributes) feels like a big win to embed
>     standardized structure in a website.  In my case, however, I have
>     2m+ 'datasets' and this may cause issues embedding in a single
>     DataCatalog structure (in my implementation I've artificially
>     limited this to 500). This may be due to my abuse of Dataset but
>     the same problem could crop up in other contexts.
>
>     * Also in linking DataCatalog and Dataset, I am just embedding the
>     Dataset url in the DataCatalog, for instance, and assuming
>     software will navigate to the Dataset and extract more information
>     from that page.
>
>     * The GSDTT is essential for checking the markup and having some
>     implementation for Bioschemas specifications will be very useful.
>
>     * The GSDTT for some reason does not show multiple entries for the
>     same property (e.g. shows only one citation in [5] even though
>     there are many).  I presume this is just a GSDTT limitation.
>
>     Overall, imo, it feels really nice to embed structured bio
>     information directly in the website and this could be really
>     valuable if all the markup is consistent.  Tooling here like GSDTT
>     may be a big help.
>
>     [1] http://intermine.org/
>     [2] http://beta.synbiomine.org/synbiomine/begin.do
>     <http://beta.synbiomine.org/synbiomine/begin.do>
>     [3] http://beta.synbiomine.org/synbiomine/report.do?id=112968868
>     <http://beta.synbiomine.org/synbiomine/report.do?id=112968868>
>     [4]
>     https://search.google.com/structured-data/testing-tool#url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Fbegin.do
>     <https://search.google.com/structured-data/testing-tool#url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Fbegin.do>
>     [5]
>     https://search.google.com/structured-data/testing-tool#url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Freport.do%3Fid%3D112968868
>     <https://search.google.com/structured-data/testing-tool#url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Freport.do%3Fid%3D112968868>
>
>     Regards,
>
>     --
>     Justin Clark-Casey, Synbiomine/InterMine Developer
>     http://synbiomine.org
>     http://twitter.com/justincc
>
>
>
>
>
> -- 
>
> *Rafael C Jimenez*
> ELIXIR Chief Technical Officer
> www.elixir-europe.org <http://www.elixir-europe.org/>
>
> ELIXIR Hub, South Building
> Wellcome Genome Campus
> Hinxton, Cambridge, CB10 1SD, UK
> Tel: +44 (0) 1223 49 2574 <tel:%2B44%20%280%29%201223%20492574>
> E-Mail: rafael.jimenez@elixir-europe.org 
> <mailto:rafael.jimenez@elixir-europe.org>
>
>  ELIXIR <http://www.elixir-europe.org/>
>
Received on Wednesday, 24 May 2017 09:43:41 UTC