- From: Dan Brickley <danbri@danbri.org>
- Date: Wed, 4 Nov 2020 18:40:58 +0000
- To: Justin Clark-Casey <justinccdev@gmail.com>
- Cc: Carole Goble <carole.goble@manchester.ac.uk>, "LJ.Garcia" <lj.garcia.co@gmail.com>, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, Dan Brickley <danbri@google.com>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
- Message-ID: <CAFfrAFpAsGdh71MEx7ipBa6kd_jFP48a9YBmc4vWAAhOgJ3rhQ@mail.gmail.com>
Thanks. I have taken the liberty of making a small edit to https://github.com/BioSchemas/specifications/wiki/Technical to encourage consideration of other crawlers beyond Google's. Dan On Wed, 4 Nov 2020 at 18:15, Justin Clark-Casey <justinccdev@gmail.com> wrote: > I just added robots.txt advice to the sitemap advice that I wrote up long > ago [1]. This technical wiki page is still reachable via the technical link > in the Bioschemas website menu. > > Best, > > Justin Clark-Casey > > On Tue, 3 Nov 2020 at 11:45, Carole Goble <carole.goble@manchester.ac.uk> > wrote: > >> +1 Leyla >> >> >> >> Carole >> >> >> >> >> >> *From:* LJ.Garcia [mailto:lj.garcia.co@gmail.com] >> *Sent:* 03 November 2020 11:34 >> *To:* Gray, Alasdair J G >> *Cc:* Dan Brickley; public-bioschemas@w3.org >> *Subject:* Re: Robots.txt and Sitemap files >> >> >> >> Hi Alasdair, >> >> >> >> I would say good practices about sitemaps and robots.txt would fall into >> the subject for our next community call. >> >> >> >> Regards, >> >> >> >> On Tue, Nov 3, 2020 at 10:05 AM Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk> >> wrote: >> >> Hi All >> >> >> >> Dan thanks for the prompt on this and I would also encourage the use of >> sitemaps to allow us to know what pages are available on your site. >> >> >> >> I have added a field to the list of live deploys that lists the sitemap >> as well, although this is currently not shown on the website it is useful >> for us to have a list of these. You can find details in the following PR >> >> https://github.com/BioSchemas/bioschemas.github.io/pull/340 >> >> >> >> Best regards >> >> >> >> Alasdair >> >> >> >> -- >> >> Alasdair J G Gray >> >> Associate Professor in Computer Science, >> School of Mathematical and Computer Sciences >> Heriot-Watt University, Edinburgh, UK. >> >> Email: A.J.G.Gray@hw.ac.uk <A.J.G.Gray@hw.ac.uk> >> Web: http://www.macs.hw.ac.uk/~ajg33 >> ORCID: http://orcid.org/0000-0002-5711-4872 >> Office: Earl Mountbatten Building 1.39 >> Twitter: @gray_alasdair >> >> >> >> >> >> Heriot-Watt is a global University, as a result my working hours may not >> be your working hours. Do not feel pressure to reply to this email outside >> your working hours. >> >> >> >> >> >> To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time >> >> >> >> >> >> *From: *"danbri@google.com" <danbri@google.com> >> *Date: *Monday, 2 November 2020 at 19:12 >> *To: *"public-bioschemas@w3.org" <public-bioschemas@w3.org> >> *Subject: *Robots.txt and Sitemap files >> *Resent from: *"public-bioschemas@w3.org" <public-bioschemas@w3.org> >> *Resent date: *Monday, 2 November 2020 at 19:11 >> >> >> >> >> ***************************************************************** * >> *Caution: This email originated from a sender outside Heriot-Watt >> University. Do not follow links or open attachments if you doubt the >> authenticity of the sender or the content. * >> * ***************************************************************** >> >> >> >> >> >> Just a quick note to encourage discussion of robots.txt >> <https://en.wikipedia.org/wiki/Robots_exclusion_standard> and sitemap >> <https://en.wikipedia.org/wiki/Sitemaps> files as something that >> bioschemas implementers should think about. There are a few cases of >> bioschemas-publishing sites excluding most crawlers via a very restrictive >> robots.txt file. Similarly, sitemap files can make large and complex sites >> easier for crawlers (whether simple code or large/commercial) to collect >> data from efficiently, including URL discovery. Since the hope has always >> been that bioschemas will encourage innovative uses of marked up data, it >> seems worth making sure that sites aren't accidentally excluding >> bioschema-crawlers... >> >> >> >> cheers, >> >> >> >> Dan >> ------------------------------ >> >> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With >> campuses and students across the entire globe we span the world, delivering >> innovation and educational excellence in business, engineering, design and >> the physical, social and life sciences. This email is generated from the >> Heriot-Watt University Group, which includes: >> >> 1. Heriot-Watt University, a Scottish charity registered under >> number SC000278 >> >> 2. Heriot- Watt Services Limited (Oriam), Scotland's national >> performance centre for sport. Heriot-Watt Services Limited is a private >> limited company registered is Scotland with registered number SC271030 and >> registered office at Research & Enterprise Services Heriot-Watt University, >> Riccarton, Edinburgh, EH14 4AS. >> >> The contents (including any attachments) are confidential. If you are not >> the intended recipient of this e-mail, any disclosure, copying, >> distribution or use of its contents is strictly prohibited, and you should >> please notify the sender immediately and then delete it (including any >> attachments) from your system. >> >>
Received on Wednesday, 4 November 2020 18:41:24 UTC