Worry to many Datasets => spam Was [Re: {Disarmed} Re: DataRecord and Dataset Search]

Hi Alasdair, All,

Now that google dataset search exists I have a new worry of over using 

Take www.uniprot.org as an example. It has a bit more than a billion 
webpages. Marking them all up with Dataset for what was a DataRecord 
before would mean we would have a bit over 3.5 billion Datasets.
Google has no problem with dealing with the volume, but I am worried 
that their antispam logic/relevance would drown out the 7 or so Datasets 
that I would like to see highly ranked in their toolbox search.

Considering that most of this work is SEO related, I would vote to mark 
up just 1 page with DataCatalog/Dataset on www.uniprot.org and not on 
the other pages.

A more specific concept would be quite nice. May I suggest using a 
subtype to schema:StructuredValue e.g. bioschema:BioChemConcept.
For example the schema:mainEntity on 
"https://wormbase.org/species/c_elegans/gene/WBGene00012939" would be of 
type schema:StructuredValue.

In (hand-typed) JSON-LD roughly this.

   "@context" : "http://schema.org",
   "@id" : "https://wormbase.org/species/c_elegans/gene/WBGene00012939" ,
   "@type" : "Webpage" ,
   "identifier" : "WBGene00012939",
   "mainEntity" : {
 "@type" : "StructuredValue" ,
         "name"  : "subs-4" ,
         "hasPart" : {
  "@type" : "PropertyValue" ,
  "propertyID" : "Sequence",
  "value" : "Y47D3B.1 "


On 09/28/2018 09:37 AM, Gray, Alasdair J G wrote:
> Hi Karen,
>> On 27 Sep 2018, at 22:38, Karen Yook <karen@wormbase.org 
>> <mailto:karen@wormbase.org>> wrote:
>> I just need to weigh in here as a voice in the Alliance of Genome
>> Resources before anything gets finalized wrt to DataRecord or DataSet.
>> While we are not tied to 'DataRecord' per se, we will need something
>> other than just 'DataSet' to tag our pages.
> Can you elaborate on what you mean by, “we will need something other 
> than just ‘DataSet’ to tag our pages”?
>> We also believe specific distinctions via sub-types perhaps seems to
>> be the preferred way to do things by bothschemas.org 
>> <http://schemas.org/>and Google.  We
>> will try to come up with a more specific proposal by or at the
>> Biohackathon in Paris in a couple weeks.
> We would like to get these issues resolved before the hackathon so that 
> we can have stable core profiles for use in marking up with resources.
> Thanks
> Alasdair
> --
> Alasdair J G Gray
> Associate Professor in Computer Science,
> School of Mathematical and Computer Sciences
> Heriot-Watt University, Edinburgh, UK.
> Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/0000-0002-5711-4872
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
> Untitled Document
> ------------------------------------------------------------------------
> */Heriot-Watt University is The Times & The Sunday Times International 
> University of the Year 2018/*
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With 
> campuses and students across the entire globe we span the world, 
> delivering innovation and educational excellence in business, 
> engineering, design and the physical, social and life sciences.
> This email is generated from the Heriot-Watt University Group, which 
> includes:
>  1. Heriot-Watt University, a Scottish charity registered under number
>     SC000278
>  2. Edinburgh Business School a Charity Registered in Scotland,
>     SC026900. Edinburgh Business School is a company limited by
>     guarantee, registered in Scotland with registered number SC173556
>     and registered office at Heriot-Watt University Finance Office,
>     Riccarton, Currie, Midlothian, EH14 4AS
>  3. Heriot- Watt Services Limited (Oriam), Scotland's national
>     performance centre for sport. Heriot-Watt Services Limited is a
>     private limited company registered is Scotland with registered
>     number SC271030 and registered office at Research & Enterprise
>     Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS. 
> The contents (including any attachments) are confidential. If you are 
> not the intended recipient of this e-mail, any disclosure, copying, 
> distribution or use of its contents is strictly prohibited, and you 
> should please notify the sender immediately and then delete it 
> (including any attachments) from your system.

Received on Friday, 28 September 2018 08:36:52 UTC