W3C home > Mailing lists > Public > public-bioschemas@w3.org > September 2018

RE: DataRecord and Dataset Search

From: Shimoyama, Mary <shimoyama@mcw.edu>
Date: Mon, 10 Sep 2018 13:53:18 +0000
To: "Clark, Timothy W." <TWCLARK@mgh.harvard.edu>, ljgarcia <ljgarcia@ebi.ac.uk>
CC: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, Dan Brickley <danbri@google.com>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>, Natasha Noy <noy@google.com>, Vicki Tardif Holland <vtardif@google.com>
Message-ID: <9f7284b17a854f38b58e190f1ece80ed@MCWMB3c.mcwcorp.net>
Is there some thought to the idea of a data record belonging to multiple datasets? For example, there is an annotation for the rat A2m gene indicating it is associated with cardiomegaly. Does this A2m-cardiomegaly record belong to the dataset of the A2m gene and all of the data related to A2m, does it belong to the dataset of Cardiomegaly and all of the genes associated with cardiomegaly, does it belong to the dataset of  all the annotations and data taken from PMID:12494268/RGDID:1549856, does it belong to the dataset of all rat genes and their disease annotations or does it belong to the dataset of the entire RGD corpus of data?

-----Original Message-----
From: Clark, Timothy W. [mailto:TWCLARK@mgh.harvard.edu] 
Sent: Monday, September 10, 2018 8:04 AM
To: ljgarcia <ljgarcia@ebi.ac.uk>
Cc: Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>; Dan Brickley <danbri@google.com>; public-bioschemas@w3.org; Natasha Noy <noy@google.com>; Vicki Tardif Holland <vtardif@google.com>; Shimoyama, Mary <shimoyama@mcw.edu>
Subject: Re: DataRecord and Dataset Search

ATTENTION: This email originated from a sender outside of MCW. Use caution when clicking on links or opening attachments.
________________________________

Just adding in Mary Shimoyama PI of RGB to this discussion.

> On Sep 10, 2018, at 8:35 AM, ljgarcia <ljgarcia@ebi.ac.uk> wrote:
>
>       External Email - Use Caution
> Hi Alasdair,
>
> I sounds to me you have covered it all. Maybe just some more information about how we link sdo:Dataset, bs:DataRecord and bs:BioChemEntity. sdo:Dataset sdo:hasPart bs:DataRecord (DataRecord actually extends from Dataset) and then sdo:DataRecord sdo:isPartOf sdo:Dataset. A sdo:DataRecord has sdo:maiEntity bs:BioChemEntity and then a bs:BioChemEntity is sdo:mainEntityOfPage of a sdo:DataRecord.
>
> DataRecord include two additional properties:
> * sdo:additionalProperty because we want everybody to be able to add 
> no-named properties as needed
> * bs:seeAlso so ther can be links to related data records in other datasets, this one is very important in Life Sciences.
>
> Note: I am using sdo for schema.org and bs for bioschemas, although bioschemas types along with their properties should go to schema.org at some point (hopefully soon).
>
> Regards,
>
> On 2018-09-09 19:03, Gray, Alasdair J G wrote:
>> Hi Dan
>> In the life sciences datasets, the individual records tend to get 
>> their own web page, i.e. each concept in the database would have its 
>> own page. The idea for the DataRecord is to be able declare that the 
>> page about a concept is part of a Dataset.
>> I believe the approach is agnostic to the underlying storage, i.e. 
>> the page could be generated from a relational database which pulls 
>> data about the concept from multiple tables, a triplestore, or some 
>> other form of database. It is more about the granularity of this 
>> being about a single concept, e.g. row in a relational database with 
>> its foreign keys.
>> Leyla, Rafa, Susanna, what do you think? Have I characterised this 
>> correctly or are there things in Dan’s email that I am missing.
>> Alasdair
>>> On 7 Sep 2018, at 18:12, Dan Brickley <danbri@google.com> wrote:
>>> (+Natasha Noy, +Vicki Tardif Holland) On Fri, 7 Sep 2018 at 15:54, 
>>> Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk> wrote:
>>>> Hi Dan,
>>>> Great to see the announcement this week about the Google Dataset 
>>>> search. Here is a link to a blog post for anyone who has not seen 
>>>> it yet
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.blog.google_

>> products_search_making-2Dit-2Deasier-2Ddiscover-2Ddatasets_&d=DwIGaQ&
>> c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=9LvaCUW2sYxo387m5Tsee
>> hzDcIGIVxSis9TsUt73Qqg&m=9Do_KY8oQKLroB0DANNTw2d0tisoNx7KJQZ1xegXqyg&
>> s=X7OaasRJiIqJhU4v5NnlNJGHFRGBPnsqrNJMduz-DKQ&e=
>>>> Within Bioschemas, we have been building up a profile usage of 
>>>> DataCatalog containing Dataset(s) which themselves contain 
>>>> DataRecords. A DataRecord is something that we would be proposing 
>>>> as an addition to schema.org [1]. The idea is that a DataRecord is 
>>>> contained within a Dataset and would specify the types of entity 
>>>> that the record is about, e.g. Protein.
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__bioschemas.org_

>>>> types_DataRecord_specification_&d=DwIGaQ&c=aFamLAsxMIDYjNglYHTMV0iq
>>>> Fn3z4pVFYPQkjgspw4Y&r=9LvaCUW2sYxo387m5TseehzDcIGIVxSis9TsUt73Qqg&m
>>>> =9Do_KY8oQKLroB0DANNTw2d0tisoNx7KJQZ1xegXqyg&s=VQXoaBLgxbCy_Qxi4h8R
>>>> bqij_biYI-o3xrRcqvYMSPg&e= We would like to understand whether 
>>>> DataRecord is an idea to which the schema.org [1] community would 
>>>> be receptive. An alternative approach would be to use Dataset for 
>>>> both records within a Dataset and the Dataset itself.
>>> It is certainly a direction worth exploring and discussing.
>>> One issue to think through (and I think I raised this at a 
>>> bioschemas f2f last year) is that "Dataset" is a very broad notion.
>>> Some but not all datasets are tabular for example. And tabular (e.g.
>>> csv, sql) structures have non-trivial mappings to "entity"-oriented 
>>> and "record"-oriented representations. Other formats will have 
>>> different (and possibly simpler) ideas about "records". Thinking 
>>> about tabular first, there are complex mapping languages like D2RQ 
>>> or 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.w3.org_TR_r

>>> 2rml_&d=DwIGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=9LvaCUW2sYxo387m5TseehzDcIGIVxSis9TsUt73Qqg&m=9Do_KY8oQKLroB0DANNTw2d0tisoNx7KJQZ1xegXqyg&s=MYcr4sn8940aywRFbBWENNFVPxseMcirke2j3PEHUcM&e= and the RDF graph it generates versus a rows-as-records view, how would your draft design deal with multi-table datasets?
>>> Nearby in this world are specs like W3C CSVW, Data Cube, ... lots of 
>>> overlaps. It would be great to work through some examples in 
>>> detail...
>>> Dan
>>>> Thanks
>>>> Alasdair
>>>> --
>>>> Alasdair J G Gray
>>>> Associate Professor in Computer Science, School of Mathematical and 
>>>> Computer Sciences Heriot-Watt University, Edinburgh, UK.
>>>> Email: A.J.G.Gray@hw.ac.uk
>>>> Web: 
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.macs.hw.ac.

>>>> uk_-7Eajg33&d=DwIGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&
>>>> r=9LvaCUW2sYxo387m5TseehzDcIGIVxSis9TsUt73Qqg&m=9Do_KY8oQKLroB0DANN
>>>> Tw2d0tisoNx7KJQZ1xegXqyg&s=g-Y7L58vpqNcKEE1Av3OwMNwrCN0DZuOoxkll837
>>>> 5ZY&e=
>>>> ORCID: 
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__orcid.org_0000-

>>>> 2D0002-2D5711-2D4872&d=DwIGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQ
>>>> kjgspw4Y&r=9LvaCUW2sYxo387m5TseehzDcIGIVxSis9TsUt73Qqg&m=9Do_KY8oQK
>>>> LroB0DANNTw2d0tisoNx7KJQZ1xegXqyg&s=m2htr8bZ5GnacvnHur2nmU2ZA_whdHa
>>>> qMu07RxqWC8o&e=
>>>> Office: Earl Mountbatten Building 1.39
>>>> Twitter: @gray_alasdair
>>>> -------------------------
>>>> _HERIOT-WATT UNIVERSITY IS THE TIMES & THE SUNDAY TIMES 
>>>> INTERNATIONAL UNIVERSITY OF THE YEAR 2018_ Founded in 1821, 
>>>> Heriot-Watt is a leader in ideas and solutions.
>>>> With campuses and students across the entire globe we span the 
>>>> world, delivering innovation and educational excellence in 
>>>> business, engineering, design and the physical, social and life 
>>>> sciences.
>>>> This email is generated from the Heriot-Watt University Group, 
>>>> which includes:
>>>> * Heriot-Watt University, a Scottish charity registered under 
>>>> number SC000278
>>>> * Edinburgh Business School a Charity Registered in Scotland, 
>>>> SC026900. Edinburgh Business School is a company limited by 
>>>> guarantee, registered in Scotland with registered number SC173556 
>>>> and registered office at Heriot-Watt University Finance Office, 
>>>> Riccarton, Currie, Midlothian, EH14 4AS
>>>> * Heriot- Watt Services Limited (Oriam), Scotland's national 
>>>> performance centre for sport. Heriot-Watt Services Limited is a 
>>>> private limited company registered is Scotland with registered 
>>>> number SC271030 and registered office at Research & Enterprise 
>>>> Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>>>> The contents (including any attachments) are confidential. If you 
>>>> are not the intended recipient of this e-mail, any disclosure, 
>>>> copying, distribution or use of its contents is strictly 
>>>> prohibited, and you should please notify the sender immediately and 
>>>> then delete it (including any attachments) from your system.
>> --
>> Alasdair J G Gray
>> Associate Professor in Computer Science, School of Mathematical and 
>> Computer Sciences Heriot-Watt University, Edinburgh, UK.
>> Email: A.J.G.Gray@hw.ac.uk
>> Web: 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.macs.hw.ac.uk

>> _-7Eajg33&d=DwIGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=9L
>> vaCUW2sYxo387m5TseehzDcIGIVxSis9TsUt73Qqg&m=9Do_KY8oQKLroB0DANNTw2d0t
>> isoNx7KJQZ1xegXqyg&s=g-Y7L58vpqNcKEE1Av3OwMNwrCN0DZuOoxkll8375ZY&e=
>> ORCID: 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__orcid.org_0000-2D

>> 0002-2D5711-2D4872&d=DwIGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgs
>> pw4Y&r=9LvaCUW2sYxo387m5TseehzDcIGIVxSis9TsUt73Qqg&m=9Do_KY8oQKLroB0D
>> ANNTw2d0tisoNx7KJQZ1xegXqyg&s=m2htr8bZ5GnacvnHur2nmU2ZA_whdHaqMu07Rxq
>> WC8o&e=
>> Office: Earl Mountbatten Building 1.39
>> Twitter: @gray_alasdair
>> Links:
>> ------
>> [1] 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__schema.org_&d=DwI

>> GaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=9LvaCUW2sYxo387m5
>> TseehzDcIGIVxSis9TsUt73Qqg&m=9Do_KY8oQKLroB0DANNTw2d0tisoNx7KJQZ1xegX
>> qyg&s=nbyl2sZnvQQv_BYn3lmWOze4_KC9X71SP_xPlR7OBlQ&e=
>



The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at https://urldefense.proofpoint.com/v2/url?u=http-3A__www.partners.org_complianceline&d=DwIGaQ&c=aFamLAsxMIDYjNglYHTMV0iqFn3z4pVFYPQkjgspw4Y&r=9LvaCUW2sYxo387m5TseehzDcIGIVxSis9TsUt73Qqg&m=9Do_KY8oQKLroB0DANNTw2d0tisoNx7KJQZ1xegXqyg&s=PCf9hEQn8A4qGfKzVy5Tr4vuvVmHyLLNZ9hhXb6z3Rw&e= . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
Received on Monday, 10 September 2018 14:18:30 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:06 UTC