Re: DataRecord and Dataset Search

Hi Dan

In the life sciences datasets, the individual records tend to get their own web page, i.e. each concept in the database would have its own page. The idea for the DataRecord is to be able declare that the page about a concept is part of a Dataset.

I believe the approach is agnostic to the underlying storage, i.e. the page could be generated from a relational database which pulls data about the concept from multiple tables, a triplestore, or some other form of database. It is more about the granularity of this being about a single concept, e.g. row in a relational database with its foreign keys.

Leyla, Rafa, Susanna, what do you think? Have I characterised this correctly or are there things in Dan’s email that I am missing.

Alasdair

On 7 Sep 2018, at 18:12, Dan Brickley <danbri@google.com<mailto:danbri@google.com>> wrote:

(+Natasha Noy<mailto:noy@google.com>, +Vicki Tardif Holland)

On Fri, 7 Sep 2018 at 15:54, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>> wrote:
Hi Dan,

Great to see the announcement this week about the Google Dataset search. Here is a link to a blog post for anyone who has not seen it yet
https://www.blog.google/products/search/making-it-easier-discover-datasets/


Within Bioschemas, we have been building up a profile usage of DataCatalog containing Dataset(s) which themselves contain DataRecords. A DataRecord is something that we would be proposing as an addition to schema.org<http://schema.org/>. The idea is that a DataRecord is contained within a Dataset and would specify the types of entity that the record is about, e.g. Protein.
http://bioschemas.org/types/DataRecord/specification/


We would like to understand whether DataRecord is an idea to which the schema.org<http://schema.org/> community would be receptive. An alternative approach would be to use Dataset for both records within a Dataset and the Dataset itself.

It is certainly a direction worth exploring and discussing.

One issue to think through (and I think I raised this at a bioschemas f2f last year) is that "Dataset" is a very broad notion. Some but not all datasets are tabular for example. And tabular (e.g. csv, sql) structures have non-trivial mappings to "entity"-oriented and "record"-oriented representations. Other formats will have different (and possibly simpler) ideas about "records". Thinking about tabular first, there are complex mapping languages like D2RQ or https://www.w3.org/TR/r2rml/ and the RDF graph it generates versus a rows-as-records view, how would your draft design deal with multi-table datasets?

Nearby in this world are specs like W3C CSVW, Data Cube, ... lots of overlaps. It would be great to work through some examples in detail...

Dan

Thanks

Alasdair

--
Alasdair J G Gray
Associate Professor in Computer Science,
School of Mathematical and Computer Sciences
Heriot-Watt University, Edinburgh, UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair

________________________________

Heriot-Watt University is The Times & The Sunday Times International University of the Year 2018

Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences.

This email is generated from the Heriot-Watt University Group, which includes:

  1.  Heriot-Watt University, a Scottish charity registered under number SC000278
  2.  Edinburgh Business School a Charity Registered in Scotland, SC026900. Edinburgh Business School is a company limited by guarantee, registered in Scotland with registered number SC173556 and registered office at Heriot-Watt University Finance Office, Riccarton, Currie, Midlothian, EH14 4AS
  3.  Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.

--
Alasdair J G Gray
Associate Professor in Computer Science,
School of Mathematical and Computer Sciences
Heriot-Watt University, Edinburgh, UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair

Received on Sunday, 9 September 2018 18:03:39 UTC