Re: Very rough prototype implementation of DataCatalog/Dataset schema.org markup in InterMine

Sure, I'd be happy to.

On 24 May 2017 at 09:33, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk> wrote:

> Hi Justin
>
> Thanks for doing this and pushing forward everyone's understanding.
>
> I know this is really short notice but could you present the highlights of
> your work this afternoon?
>
> Alasdair J G Gray
> Fellow of the Higher Education Academy
> Assistant Professor in Computer Science
> Herriot-Watt University, Edinburgh
>
> www.macs.hw.ac.uk/~ajg33 <http://%3Cbr/%3Ewww.macs.hw.ac.uk/~ajg33>
>
> ------------------------------
> *From:* Justin Clark-Casey <jc955@cam.ac.uk>
> *Sent:* Tuesday, May 16, 2017 3:54:15 PM
> *To:* public-bioschemas@w3.org
> *Subject:* Very rough prototype implementation of DataCatalog/Dataset
> schema.org markup in InterMine
>
> Hi all.  In advance of the Bioschemas meeting next week, I've hacked up a
> very rough implementation of schema.org markup in InterMine [1].
> Specifically, this
> is in an installation of InterMine called Synbiomine [2], a data warehouse
> for synthetic biology that I've been working on.  This compiles information
> from many
> sources (EBI, NCBI, etc.) into integrated biological object reports
> (genes, proteins, parts, etc.).
>
> In lieu of of 'proper' Bioschemas structures, I've put in DataCatalog and
> Dataset.  In fact, I'm abusing Dataset to represent integrated objects
> (e.g. protein
> Q816S6_BACCR) but I wanted to experiment with linking structures (in this
> case DataCatalog and Dataset).  The front page embeds the DataCatalog and
> individual
> report pages (e.g. [3]) embed Dataset.  You can see the Google Structured
> Data Testing Tool (GSDTT) analysis of the front page at [4] and a
> particular report
> pages at [5].
>
> My top 5 immediate observations:
>
> * Embedding JSON-LD itself is not hard.  More challenging is interpreting
> which schema.org properties to use and how to use them (e.g.
> CreativeWork.about or
> Thing.description)?
>
> * Being able to link DataCatalog and Dataset (via dataset and
> includedInDataCatalog attributes) feels like a big win to embed
> standardized structure in a
> website.  In my case, however, I have 2m+ 'datasets' and this may cause
> issues embedding in a single DataCatalog structure (in my implementation
> I've
> artificially limited this to 500).  This may be due to my abuse of Dataset
> but the same problem could crop up in other contexts.
>
> * Also in linking DataCatalog and Dataset, I am just embedding the Dataset
> url in the DataCatalog, for instance, and assuming software will navigate
> to the
> Dataset and extract more information from that page.
>
> * The GSDTT is essential for checking the markup and having some
> implementation for Bioschemas specifications will be very useful.
>
> * The GSDTT for some reason does not show multiple entries for the same
> property (e.g. shows only one citation in [5] even though there are many).
> I presume
> this is just a GSDTT limitation.
>
> Overall, imo, it feels really nice to embed structured bio information
> directly in the website and this could be really valuable if all the markup
> is
> consistent.  Tooling here like GSDTT may be a big help.
>
> [1] http://intermine.org/
> [2] http://beta.synbiomine.org/synbiomine/begin.do
> [3] http://beta.synbiomine.org/synbiomine/report.do?id=112968868
> [4] https://search.google.com/structured-data/testing-tool#
> url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%2Fbegin.do
> [5] https://search.google.com/structured-data/testing-tool#
> url=http%3A%2F%2Fbeta.synbiomine.org%2Fsynbiomine%
> 2Freport.do%3Fid%3D112968868
>
> Regards,
>
> --
> Justin Clark-Casey, Synbiomine/InterMine Developer
> http://synbiomine.org
> http://twitter.com/justincc
>
>
> ------------------------------
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> the physical, social and life sciences.
>
> This email is generated from the Heriot-Watt University Group, which
> includes:
>
>    1. Heriot-Watt University, a Scottish charity registered under number
>    SC000278
>    2. Edinburgh Business School a Charity Registered in Scotland,
>    SC026900. Edinburgh Business School is a company limited by guarantee,
>    registered in Scotland with registered number SC173556 and registered
>    office at Heriot-Watt University Finance Office, Riccarton, Currie,
>    Midlothian, EH14 4AS
>    3. Heriot- Watt Services Limited (Oriam), Scotland's national
>    performance centre for sport. Heriot-Watt Services Limited is a private
>    limited company registered is Scotland with registered number SC271030 and
>    registered office at Research & Enterprise Services Heriot-Watt University,
>    Riccarton, Edinburgh, EH14 4AS.
>
> The contents (including any attachments) are confidential. If you are not
> the intended recipient of this e-mail, any disclosure, copying,
> distribution or use of its contents is strictly prohibited, and you should
> please notify the sender immediately and then delete it (including any
> attachments) from your system.
>

Received on Wednesday, 24 May 2017 08:57:46 UTC