Re: Robots.txt and Sitemap files from Gray, Alasdair J G on 2020-11-03 (public-bioschemas@w3.org from November 2020)

From: Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
Date: Tue, 3 Nov 2020 09:03:58 +0000
To: Dan Brickley <danbri@google.com>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
Message-ID: <67826428-53ED-488A-ACDF-9B5AFCD4DF10@hw.ac.uk>

Hi All

Dan thanks for the prompt on this and I would also encourage the use of sitemaps to allow us to know what pages are available on your site.

I have added a field to the list of live deploys that lists the sitemap as well, although this is currently not shown on the website it is useful for us to have a list of these. You can find details in the following PR
https://github.com/BioSchemas/bioschemas.github.io/pull/340


Best regards

Alasdair

--
Alasdair J G Gray
Associate Professor in Computer Science,
School of Mathematical and Computer Sciences
Heriot-Watt University, Edinburgh, UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair


Heriot-Watt is a global University, as a result my working hours may not be your working hours. Do not feel pressure to reply to this email outside your working hours.


To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time



From: "danbri@google.com" <danbri@google.com>
Date: Monday, 2 November 2020 at 19:12
To: "public-bioschemas@w3.org" <public-bioschemas@w3.org>
Subject: Robots.txt and Sitemap files
Resent from: "public-bioschemas@w3.org" <public-bioschemas@w3.org>
Resent date: Monday, 2 November 2020 at 19:11

****************************************************************
Caution: This email originated from a sender outside Heriot-Watt University.
Do not follow links or open attachments if you doubt the authenticity of the sender or the content.
****************************************************************


Just a quick note to encourage discussion of robots.txt<https://en.wikipedia.org/wiki/Robots_exclusion_standard> and sitemap<https://en.wikipedia.org/wiki/Sitemaps> files as something that bioschemas implementers should think about. There are a few cases of bioschemas-publishing sites excluding most crawlers via a very restrictive robots.txt file. Similarly, sitemap files can make large and complex sites easier for crawlers (whether simple code or large/commercial) to collect data from efficiently, including URL discovery. Since the hope has always been that bioschemas will encourage innovative uses of marked up data, it seems worth making sure that sites aren't accidentally excluding bioschema-crawlers...

cheers,

Dan
________________________________

Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. This email is generated from the Heriot-Watt University Group, which includes:

  1.  Heriot-Watt University, a Scottish charity registered under number SC000278
  2.  Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.

Received on Tuesday, 3 November 2020 09:04:19 UTC