- From: Justin Clark-Casey <justinccdev@gmail.com>
- Date: Wed, 4 Nov 2020 18:14:19 +0000
- To: Carole Goble <carole.goble@manchester.ac.uk>
- Cc: "LJ.Garcia" <lj.garcia.co@gmail.com>, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, Dan Brickley <danbri@google.com>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
- Message-ID: <CAME9NR9EjJzcsME0Sn7NkTW0amt6WBtXFM+aZtGBOZ09zCYWhg@mail.gmail.com>
I just added robots.txt advice to the sitemap advice that I wrote up long ago [1]. This technical wiki page is still reachable via the technical link in the Bioschemas website menu. Best, Justin Clark-Casey On Tue, 3 Nov 2020 at 11:45, Carole Goble <carole.goble@manchester.ac.uk> wrote: > +1 Leyla > > > > Carole > > > > > > *From:* LJ.Garcia [mailto:lj.garcia.co@gmail.com] > *Sent:* 03 November 2020 11:34 > *To:* Gray, Alasdair J G > *Cc:* Dan Brickley; public-bioschemas@w3.org > *Subject:* Re: Robots.txt and Sitemap files > > > > Hi Alasdair, > > > > I would say good practices about sitemaps and robots.txt would fall into > the subject for our next community call. > > > > Regards, > > > > On Tue, Nov 3, 2020 at 10:05 AM Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk> > wrote: > > Hi All > > > > Dan thanks for the prompt on this and I would also encourage the use of > sitemaps to allow us to know what pages are available on your site. > > > > I have added a field to the list of live deploys that lists the sitemap as > well, although this is currently not shown on the website it is useful for > us to have a list of these. You can find details in the following PR > > https://github.com/BioSchemas/bioschemas.github.io/pull/340 > > > > Best regards > > > > Alasdair > > > > -- > > Alasdair J G Gray > > Associate Professor in Computer Science, > School of Mathematical and Computer Sciences > Heriot-Watt University, Edinburgh, UK. > > Email: A.J.G.Gray@hw.ac.uk <A.J.G.Gray@hw.ac.uk> > Web: http://www.macs.hw.ac.uk/~ajg33 > ORCID: http://orcid.org/0000-0002-5711-4872 > Office: Earl Mountbatten Building 1.39 > Twitter: @gray_alasdair > > > > > > Heriot-Watt is a global University, as a result my working hours may not > be your working hours. Do not feel pressure to reply to this email outside > your working hours. > > > > > > To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time > > > > > > *From: *"danbri@google.com" <danbri@google.com> > *Date: *Monday, 2 November 2020 at 19:12 > *To: *"public-bioschemas@w3.org" <public-bioschemas@w3.org> > *Subject: *Robots.txt and Sitemap files > *Resent from: *"public-bioschemas@w3.org" <public-bioschemas@w3.org> > *Resent date: *Monday, 2 November 2020 at 19:11 > > > > > ***************************************************************** * > *Caution: This email originated from a sender outside Heriot-Watt > University. Do not follow links or open attachments if you doubt the > authenticity of the sender or the content. * > * ***************************************************************** > > > > > > Just a quick note to encourage discussion of robots.txt > <https://en.wikipedia.org/wiki/Robots_exclusion_standard> and sitemap > <https://en.wikipedia.org/wiki/Sitemaps> files as something that > bioschemas implementers should think about. There are a few cases of > bioschemas-publishing sites excluding most crawlers via a very restrictive > robots.txt file. Similarly, sitemap files can make large and complex sites > easier for crawlers (whether simple code or large/commercial) to collect > data from efficiently, including URL discovery. Since the hope has always > been that bioschemas will encourage innovative uses of marked up data, it > seems worth making sure that sites aren't accidentally excluding > bioschema-crawlers... > > > > cheers, > > > > Dan > ------------------------------ > > Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With > campuses and students across the entire globe we span the world, delivering > innovation and educational excellence in business, engineering, design and > the physical, social and life sciences. This email is generated from the > Heriot-Watt University Group, which includes: > > 1. Heriot-Watt University, a Scottish charity registered under > number SC000278 > > 2. Heriot- Watt Services Limited (Oriam), Scotland's national > performance centre for sport. Heriot-Watt Services Limited is a private > limited company registered is Scotland with registered number SC271030 and > registered office at Research & Enterprise Services Heriot-Watt University, > Riccarton, Edinburgh, EH14 4AS. > > The contents (including any attachments) are confidential. If you are not > the intended recipient of this e-mail, any disclosure, copying, > distribution or use of its contents is strictly prohibited, and you should > please notify the sender immediately and then delete it (including any > attachments) from your system. > >
Received on Wednesday, 4 November 2020 18:15:11 UTC