- From: Carole Goble <carole.goble@manchester.ac.uk>
- Date: Tue, 3 Nov 2020 11:45:01 +0000
- To: LJ.Garcia <lj.garcia.co@gmail.com>, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
- CC: Dan Brickley <danbri@google.com>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
- Message-ID: <9E11C003B89A7A45B564EAAEEAE27DBF012454C3D5@MBXP01.ds.man.ac.uk>
+1 Leyla Carole From: LJ.Garcia [mailto:lj.garcia.co@gmail.com] Sent: 03 November 2020 11:34 To: Gray, Alasdair J G Cc: Dan Brickley; public-bioschemas@w3.org Subject: Re: Robots.txt and Sitemap files Hi Alasdair, I would say good practices about sitemaps and robots.txt would fall into the subject for our next community call. Regards, On Tue, Nov 3, 2020 at 10:05 AM Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>> wrote: Hi All Dan thanks for the prompt on this and I would also encourage the use of sitemaps to allow us to know what pages are available on your site. I have added a field to the list of live deploys that lists the sitemap as well, although this is currently not shown on the website it is useful for us to have a list of these. You can find details in the following PR https://github.com/BioSchemas/bioschemas.github.io/pull/340 Best regards Alasdair -- Alasdair J G Gray Associate Professor in Computer Science, School of Mathematical and Computer Sciences Heriot-Watt University, Edinburgh, UK. Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk> Web: http://www.macs.hw.ac.uk/~ajg33 ORCID: http://orcid.org/0000-0002-5711-4872 Office: Earl Mountbatten Building 1.39 Twitter: @gray_alasdair Heriot-Watt is a global University, as a result my working hours may not be your working hours. Do not feel pressure to reply to this email outside your working hours. To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time From: "danbri@google.com<mailto:danbri@google.com>" <danbri@google.com<mailto:danbri@google.com>> Date: Monday, 2 November 2020 at 19:12 To: "public-bioschemas@w3.org<mailto:public-bioschemas@w3.org>" <public-bioschemas@w3.org<mailto:public-bioschemas@w3.org>> Subject: Robots.txt and Sitemap files Resent from: "public-bioschemas@w3.org<mailto:public-bioschemas@w3.org>" <public-bioschemas@w3.org<mailto:public-bioschemas@w3.org>> Resent date: Monday, 2 November 2020 at 19:11 **************************************************************** Caution: This email originated from a sender outside Heriot-Watt University. Do not follow links or open attachments if you doubt the authenticity of the sender or the content. **************************************************************** Just a quick note to encourage discussion of robots.txt<https://en.wikipedia.org/wiki/Robots_exclusion_standard> and sitemap<https://en.wikipedia.org/wiki/Sitemaps> files as something that bioschemas implementers should think about. There are a few cases of bioschemas-publishing sites excluding most crawlers via a very restrictive robots.txt file. Similarly, sitemap files can make large and complex sites easier for crawlers (whether simple code or large/commercial) to collect data from efficiently, including URL discovery. Since the hope has always been that bioschemas will encourage innovative uses of marked up data, it seems worth making sure that sites aren't accidentally excluding bioschema-crawlers... cheers, Dan ________________________________ Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. This email is generated from the Heriot-Watt University Group, which includes: 1. Heriot-Watt University, a Scottish charity registered under number SC000278 2. Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS. The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.
Received on Tuesday, 3 November 2020 11:45:17 UTC