Discussions around descriptions of training datasets from LJ.Garcia on 2023-01-30 (public-bioschemas@w3.org from January 2023)

From: LJ.Garcia <lj.garcia.co@gmail.com>
Date: Mon, 30 Jan 2023 14:55:56 +0100
To: public-bioschemas <public-bioschemas@w3.org>
Message-ID: <CAPZUG=AvkqCW-wWpYiQxqfX2WC4fnNAedaxzOX1TnzJhHbqpfQ@mail.gmail.com>

Dear Bioschemas community,

During the past BioHackathon ELIXIR, we started some discussion around how
to describe those characteristics of an ML training dataset useful to
potential users (e.g., people creating ML training and looking for relevant
training datasets).

We have created some issues, please join the discussion.
* Description of the intended ML task
<https://github.com/BioSchemas/specifications/issues/630>
* Description of distribution/splits of the dataset
<https://github.com/BioSchemas/specifications/issues/631>

And more broadly, the need of extending the coverage of DefinedTerm
<https://github.com/schemaorg/schemaorg/issues/3250> as many times we want
to use is but we find that schema.org only supports Text, or URL or any
other range but DefinedTerm (which is a common case in Life Sciences).

Kind regards,

Received on Monday, 30 January 2023 13:56:20 UTC