Re: Worry to many Datasets => spam Was [Re: {Disarmed} Re: DataRecord and Dataset Search]

Dear Karen,

Currently, Bioschemas recommends BioChemEntity as the range for 
mainEntity. If StructuredValue works better for you, could you please 
open an issue at https://github.com/BioSchemas/specifications/issues ? 
Someone from Bioschemas will pick it up so this change is evaluated and 
later incorporated into the specifications. Please mention Alasdair and 
me in the issue so we get notified. Issues in GitHub help us keeping in 
mind actions that come from discussions in the mailing list.

Kind regards,


On 28/09/2018 19:57, Karen Yook wrote:
> Hi Jerven,
>
> Thank you Jerven for suggesting the
> "subtype to schema:StructuredValue e.g. bioschema:BioChemConcept"
>
> And raising the potential problem with sole use of 'Dataset' in the
> current proposed tag set. As you point out for UniProt,  that is one
> of the problems that would also affect Alliance pages. In addition,
> 'Dataset' is just not a good description of our pages, rather they are
> the living compilations of curation being created  from many
> 'datasets', which range from large scale datasets to single bioentity
> studies.
>
> Alasdair, if you need more specific examples of how 'Dataset' would be
> less than ideal for us, let me know.  However, for now, I am happy
> with what Jerven has proposed.  I will discuss this internally with
> the Alliance to see if there are more specific things we need to
> address.
>
> Best,
> Karen
>
>
>
>
> On Fri, Sep 28, 2018 at 1:37 AM Jerven Bolleman
> <jerven.bolleman@sib.swiss> wrote:
>> Hi Alasdair, All,
>>
>> Now that google dataset search exists I have a new worry of over using
>> Dataset.
>>
>> Take www.uniprot.org as an example. It has a bit more than a billion
>> webpages. Marking them all up with Dataset for what was a DataRecord
>> before would mean we would have a bit over 3.5 billion Datasets.
>> Google has no problem with dealing with the volume, but I am worried
>> that their antispam logic/relevance would drown out the 7 or so Datasets
>> that I would like to see highly ranked in their toolbox search.
>>
>> Considering that most of this work is SEO related, I would vote to mark
>> up just 1 page with DataCatalog/Dataset on www.uniprot.org and not on
>> the other pages.
>>
>> A more specific concept would be quite nice. May I suggest using a
>> subtype to schema:StructuredValue e.g. bioschema:BioChemConcept.
>> For example the schema:mainEntity on
>> "https://wormbase.org/species/c_elegans/gene/WBGene00012939" would be of
>> type schema:StructuredValue.
>>
>> In (hand-typed) JSON-LD roughly this.
>>
>> {
>>     "@context" : "http://schema.org",
>>     "@id" : "https://wormbase.org/species/c_elegans/gene/WBGene00012939" ,
>>     "@type" : "Webpage" ,
>>     "identifier" : "WBGene00012939",
>>     "mainEntity" : {
>>          "@type" : "StructuredValue" ,
>>           "name"  : "subs-4" ,
>>           "hasPart" : {
>>                  "@type" : "PropertyValue" ,
>>                  "propertyID" : "Sequence",
>>                  "value" : "Y47D3B.1 "
>>          }
>>      }
>> }
>>
>>
>> Regards,
>> Jerven
>>
>>
>> On 09/28/2018 09:37 AM, Gray, Alasdair J G wrote:
>>> Hi Karen,
>>>
>>>> On 27 Sep 2018, at 22:38, Karen Yook <karen@wormbase.org
>>>> <mailto:karen@wormbase.org>> wrote:
>>>>
>>>> I just need to weigh in here as a voice in the Alliance of Genome
>>>> Resources before anything gets finalized wrt to DataRecord or DataSet.
>>>> While we are not tied to 'DataRecord' per se, we will need something
>>>> other than just 'DataSet' to tag our pages.
>>> Can you elaborate on what you mean by, “we will need something other
>>> than just ‘DataSet’ to tag our pages”?
>>>
>>>> We also believe specific distinctions via sub-types perhaps seems to
>>>> be the preferred way to do things by bothschemas.org
>>>> <http://schemas.org/>and Google.  We
>>>> will try to come up with a more specific proposal by or at the
>>>> Biohackathon in Paris in a couple weeks.
>>> We would like to get these issues resolved before the hackathon so that
>>> we can have stable core profiles for use in marking up with resources.
>>>
>>> Thanks
>>>
>>> Alasdair
>>>
>>> --
>>> Alasdair J G Gray
>>> Associate Professor in Computer Science,
>>> School of Mathematical and Computer Sciences
>>> Heriot-Watt University, Edinburgh, UK.
>>>
>>> Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
>>> Web: http://www.macs.hw.ac.uk/~ajg33
>>> ORCID: http://orcid.org/0000-0002-5711-4872
>>> Office: Earl Mountbatten Building 1.39
>>> Twitter: @gray_alasdair
>>>
>>> Untitled Document
>>> ------------------------------------------------------------------------
>>>
>>> */Heriot-Watt University is The Times & The Sunday Times International
>>> University of the Year 2018/*
>>>
>>> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
>>> campuses and students across the entire globe we span the world,
>>> delivering innovation and educational excellence in business,
>>> engineering, design and the physical, social and life sciences.
>>>
>>> This email is generated from the Heriot-Watt University Group, which
>>> includes:
>>>
>>>   1. Heriot-Watt University, a Scottish charity registered under number
>>>      SC000278
>>>   2. Edinburgh Business School a Charity Registered in Scotland,
>>>      SC026900. Edinburgh Business School is a company limited by
>>>      guarantee, registered in Scotland with registered number SC173556
>>>      and registered office at Heriot-Watt University Finance Office,
>>>      Riccarton, Currie, Midlothian, EH14 4AS
>>>   3. Heriot- Watt Services Limited (Oriam), Scotland's national
>>>      performance centre for sport. Heriot-Watt Services Limited is a
>>>      private limited company registered is Scotland with registered
>>>      number SC271030 and registered office at Research & Enterprise
>>>      Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>>>
>>> The contents (including any attachments) are confidential. If you are
>>> not the intended recipient of this e-mail, any disclosure, copying,
>>> distribution or use of its contents is strictly prohibited, and you
>>> should please notify the sender immediately and then delete it
>>> (including any attachments) from your system.
>>>
>>

Received on Monday, 1 October 2018 08:50:42 UTC