Re: Robots.txt and Sitemap files

Hi Alasdair,

I would say good practices about sitemaps and robots.txt would fall into
the subject for our next community call.

Regards,

On Tue, Nov 3, 2020 at 10:05 AM Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
wrote:

> Hi All
>
>
>
> Dan thanks for the prompt on this and I would also encourage the use of
> sitemaps to allow us to know what pages are available on your site.
>
>
>
> I have added a field to the list of live deploys that lists the sitemap as
> well, although this is currently not shown on the website it is useful for
> us to have a list of these. You can find details in the following PR
>
> https://github.com/BioSchemas/bioschemas.github.io/pull/340
>
>
>
> Best regards
>
>
>
> Alasdair
>
>
>
> --
>
> Alasdair J G Gray
>
> Associate Professor in Computer Science,
> School of Mathematical and Computer Sciences
> Heriot-Watt University, Edinburgh, UK.
>
> Email: A.J.G.Gray@hw.ac.uk <A.J.G.Gray@hw.ac.uk>
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/0000-0002-5711-4872
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
>
>
>
>
>
> Heriot-Watt is a global University, as a result my working hours may not
> be your working hours. Do not feel pressure to reply to this email outside
> your working hours.
>
>
>
>
>
> To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time
>
>
>
>
>
> *From: *"danbri@google.com" <danbri@google.com>
> *Date: *Monday, 2 November 2020 at 19:12
> *To: *"public-bioschemas@w3.org" <public-bioschemas@w3.org>
> *Subject: *Robots.txt and Sitemap files
> *Resent from: *"public-bioschemas@w3.org" <public-bioschemas@w3.org>
> *Resent date: *Monday, 2 November 2020 at 19:11
>
>
>
>
> ***************************************************************** *
> *Caution: This email originated from a sender outside Heriot-Watt
> University. Do not follow links or open attachments if you doubt the
> authenticity of the sender or the content. *
> * *****************************************************************
>
>
>
>
>
> Just a quick note to encourage discussion of robots.txt
> <https://en.wikipedia.org/wiki/Robots_exclusion_standard> and sitemap
> <https://en.wikipedia.org/wiki/Sitemaps> files as something that
> bioschemas implementers should think about. There are a few cases of
> bioschemas-publishing sites excluding most crawlers via a very restrictive
> robots.txt file. Similarly, sitemap files can make large and complex sites
> easier for crawlers (whether simple code or large/commercial) to collect
> data from efficiently, including URL discovery. Since the hope has always
> been that bioschemas will encourage innovative uses of marked up data, it
> seems worth making sure that sites aren't accidentally excluding
> bioschema-crawlers...
>
>
>
> cheers,
>
>
>
> Dan
> ------------------------------
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> the physical, social and life sciences. This email is generated from the
> Heriot-Watt University Group, which includes:
>
>    1. Heriot-Watt University, a Scottish charity registered under number
>    SC000278
>    2. Heriot- Watt Services Limited (Oriam), Scotland's national
>    performance centre for sport. Heriot-Watt Services Limited is a private
>    limited company registered is Scotland with registered number SC271030 and
>    registered office at Research & Enterprise Services Heriot-Watt University,
>    Riccarton, Edinburgh, EH14 4AS.
>
> The contents (including any attachments) are confidential. If you are not
> the intended recipient of this e-mail, any disclosure, copying,
> distribution or use of its contents is strictly prohibited, and you should
> please notify the sender immediately and then delete it (including any
> attachments) from your system.
>

Received on Tuesday, 3 November 2020 11:34:47 UTC