W3C home > Mailing lists > Public > public-bioschemas@w3.org > September 2018

Re: DataRecord and Dataset Search

From: Dan Brickley <danbri@google.com>
Date: Fri, 7 Sep 2018 17:12:42 +0100
Message-ID: <CAK-qy=4XiHh5OG4wnCMkCXw=kmZwVah02ZdqsK9A8M2j4sWouQ@mail.gmail.com>
To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
Cc: public-bioschemas@w3.org, Natasha Noy <noy@google.com>, Vicki Tardif Holland <vtardif@google.com>
(+Natasha Noy <noy@google.com>, +Vicki Tardif Holland)

On Fri, 7 Sep 2018 at 15:54, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk> wrote:

> Hi Dan,
>
> Great to see the announcement this week about the Google Dataset search.
> Here is a link to a blog post for anyone who has not seen it yet
> https://www.blog.google/products/search/making-it-easier-discover-datasets/
>
> Within Bioschemas, we have been building up a profile usage of DataCatalog
> containing Dataset(s) which themselves contain DataRecords. A DataRecord is
> something that we would be proposing as an addition to schema.org. The
> idea is that a DataRecord is contained within a Dataset and would specify
> the types of entity that the record is about, e.g. Protein.
> http://bioschemas.org/types/DataRecord/specification/
>
> We would like to understand whether DataRecord is an idea to which the
> schema.org community would be receptive. An alternative approach would be
> to use Dataset for both records within a Dataset and the Dataset itself.
>

It is certainly a direction worth exploring and discussing.

One issue to think through (and I think I raised this at a bioschemas f2f
last year) is that "Dataset" is a very broad notion. Some but not all
datasets are tabular for example. And tabular (e.g. csv, sql) structures
have non-trivial mappings to "entity"-oriented and "record"-oriented
representations. Other formats will have different (and possibly simpler)
ideas about "records". Thinking about tabular first, there are complex
mapping languages like D2RQ or https://www.w3.org/TR/r2rml/ and the RDF
graph it generates versus a rows-as-records view, how would your draft
design deal with multi-table datasets?

Nearby in this world are specs like W3C CSVW, Data Cube, ... lots of
overlaps. It would be great to work through some examples in detail...

Dan


> Thanks
>
> Alasdair
>
> --
> Alasdair J G Gray
> Associate Professor in Computer Science,
> School of Mathematical and Computer Sciences
> Heriot-Watt University, Edinburgh, UK.
>
> Email: A.J.G.Gray@hw.ac.uk <A.J.G.Gray@hw.ac.uk>
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/0000-0002-5711-4872
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
>
> ------------------------------
>
> *Heriot-Watt University is The Times & The Sunday Times International
> University of the Year 2018*
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> the physical, social and life sciences.
>
> This email is generated from the Heriot-Watt University Group, which
> includes:
>
>    1. Heriot-Watt University, a Scottish charity registered under number
>    SC000278
>    2. Edinburgh Business School a Charity Registered in Scotland,
>    SC026900. Edinburgh Business School is a company limited by guarantee,
>    registered in Scotland with registered number SC173556 and registered
>    office at Heriot-Watt University Finance Office, Riccarton, Currie,
>    Midlothian, EH14 4AS
>    3. Heriot- Watt Services Limited (Oriam), Scotland's national
>    performance centre for sport. Heriot-Watt Services Limited is a private
>    limited company registered is Scotland with registered number SC271030 and
>    registered office at Research & Enterprise Services Heriot-Watt University,
>    Riccarton, Edinburgh, EH14 4AS.
>
> The contents (including any attachments) are confidential. If you are not
> the intended recipient of this e-mail, any disclosure, copying,
> distribution or use of its contents is strictly prohibited, and you should
> please notify the sender immediately and then delete it (including any
> attachments) from your system.
>
Received on Friday, 7 September 2018 16:13:17 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:06 UTC