W3C home > Mailing lists > Public > public-bioschemas@w3.org > September 2018

Re: Worry to many Datasets => spam Was [Re: {Disarmed} Re: DataRecord and Dataset Search]

From: Karen Yook <karen@wormbase.org>
Date: Fri, 28 Sep 2018 11:57:23 -0700
Message-ID: <CAF4a_bON3nGK3VjJ5bGtLJoVeuzySrVX7H=4_Hk=mDGpvCRbNg@mail.gmail.com>
To: jerven.bolleman@sib.swiss
Cc: Alasdair J G Gray <A.J.G.Gray@hw.ac.uk>, public-bioschemas@w3.org
Hi Jerven,

Thank you Jerven for suggesting the
"subtype to schema:StructuredValue e.g. bioschema:BioChemConcept"

And raising the potential problem with sole use of 'Dataset' in the
current proposed tag set. As you point out for UniProt,  that is one
of the problems that would also affect Alliance pages. In addition,
'Dataset' is just not a good description of our pages, rather they are
the living compilations of curation being created  from many
'datasets', which range from large scale datasets to single bioentity
studies.

Alasdair, if you need more specific examples of how 'Dataset' would be
less than ideal for us, let me know.  However, for now, I am happy
with what Jerven has proposed.  I will discuss this internally with
the Alliance to see if there are more specific things we need to
address.

Best,
Karen




On Fri, Sep 28, 2018 at 1:37 AM Jerven Bolleman
<jerven.bolleman@sib.swiss> wrote:
>
> Hi Alasdair, All,
>
> Now that google dataset search exists I have a new worry of over using
> Dataset.
>
> Take www.uniprot.org as an example. It has a bit more than a billion
> webpages. Marking them all up with Dataset for what was a DataRecord
> before would mean we would have a bit over 3.5 billion Datasets.
> Google has no problem with dealing with the volume, but I am worried
> that their antispam logic/relevance would drown out the 7 or so Datasets
> that I would like to see highly ranked in their toolbox search.
>
> Considering that most of this work is SEO related, I would vote to mark
> up just 1 page with DataCatalog/Dataset on www.uniprot.org and not on
> the other pages.
>
> A more specific concept would be quite nice. May I suggest using a
> subtype to schema:StructuredValue e.g. bioschema:BioChemConcept.
> For example the schema:mainEntity on
> "https://wormbase.org/species/c_elegans/gene/WBGene00012939" would be of
> type schema:StructuredValue.
>
> In (hand-typed) JSON-LD roughly this.
>
> {
>    "@context" : "http://schema.org",
>    "@id" : "https://wormbase.org/species/c_elegans/gene/WBGene00012939" ,
>    "@type" : "Webpage" ,
>    "identifier" : "WBGene00012939",
>    "mainEntity" : {
>         "@type" : "StructuredValue" ,
>          "name"  : "subs-4" ,
>          "hasPart" : {
>                 "@type" : "PropertyValue" ,
>                 "propertyID" : "Sequence",
>                 "value" : "Y47D3B.1 "
>         }
>     }
> }
>
>
> Regards,
> Jerven
>
>
> On 09/28/2018 09:37 AM, Gray, Alasdair J G wrote:
> > Hi Karen,
> >
> >> On 27 Sep 2018, at 22:38, Karen Yook <karen@wormbase.org
> >> <mailto:karen@wormbase.org>> wrote:
> >>
> >> I just need to weigh in here as a voice in the Alliance of Genome
> >> Resources before anything gets finalized wrt to DataRecord or DataSet.
> >> While we are not tied to 'DataRecord' per se, we will need something
> >> other than just 'DataSet' to tag our pages.
> >
> > Can you elaborate on what you mean by, “we will need something other
> > than just ‘DataSet’ to tag our pages”?
> >
> >>
> >> We also believe specific distinctions via sub-types perhaps seems to
> >> be the preferred way to do things by bothschemas.org
> >> <http://schemas.org/>and Google.  We
> >> will try to come up with a more specific proposal by or at the
> >> Biohackathon in Paris in a couple weeks.
> >
> > We would like to get these issues resolved before the hackathon so that
> > we can have stable core profiles for use in marking up with resources.
> >
> > Thanks
> >
> > Alasdair
> >
> > --
> > Alasdair J G Gray
> > Associate Professor in Computer Science,
> > School of Mathematical and Computer Sciences
> > Heriot-Watt University, Edinburgh, UK.
> >
> > Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
> > Web: http://www.macs.hw.ac.uk/~ajg33
> > ORCID: http://orcid.org/0000-0002-5711-4872
> > Office: Earl Mountbatten Building 1.39
> > Twitter: @gray_alasdair
> >
> > Untitled Document
> > ------------------------------------------------------------------------
> >
> > */Heriot-Watt University is The Times & The Sunday Times International
> > University of the Year 2018/*
> >
> > Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> > campuses and students across the entire globe we span the world,
> > delivering innovation and educational excellence in business,
> > engineering, design and the physical, social and life sciences.
> >
> > This email is generated from the Heriot-Watt University Group, which
> > includes:
> >
> >  1. Heriot-Watt University, a Scottish charity registered under number
> >     SC000278
> >  2. Edinburgh Business School a Charity Registered in Scotland,
> >     SC026900. Edinburgh Business School is a company limited by
> >     guarantee, registered in Scotland with registered number SC173556
> >     and registered office at Heriot-Watt University Finance Office,
> >     Riccarton, Currie, Midlothian, EH14 4AS
> >  3. Heriot- Watt Services Limited (Oriam), Scotland's national
> >     performance centre for sport. Heriot-Watt Services Limited is a
> >     private limited company registered is Scotland with registered
> >     number SC271030 and registered office at Research & Enterprise
> >     Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
> >
> > The contents (including any attachments) are confidential. If you are
> > not the intended recipient of this e-mail, any disclosure, copying,
> > distribution or use of its contents is strictly prohibited, and you
> > should please notify the sender immediately and then delete it
> > (including any attachments) from your system.
> >
>
>
Received on Friday, 28 September 2018 18:57:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:06 UTC