Robots.txt and Sitemap files from Dan Brickley on 2020-11-02 (public-bioschemas@w3.org from November 2020)

From: Dan Brickley <danbri@google.com>
Date: Mon, 2 Nov 2020 19:10:47 +0000
To: public-bioschemas@w3.org
Message-ID: <CAK-qy=4rP3x+==YO2B+PYOsq59QUW0H8VLxL7uv_wEkGRawzRg@mail.gmail.com>

Just a quick note to encourage discussion of robots.txt
<https://en.wikipedia.org/wiki/Robots_exclusion_standard> and sitemap
<https://en.wikipedia.org/wiki/Sitemaps> files as something that bioschemas
implementers should think about. There are a few cases of
bioschemas-publishing sites excluding most crawlers via a very restrictive
robots.txt file. Similarly, sitemap files can make large and complex sites
easier for crawlers (whether simple code or large/commercial) to collect
data from efficiently, including URL discovery. Since the hope has always
been that bioschemas will encourage innovative uses of marked up data, it
seems worth making sure that sites aren't accidentally excluding
bioschema-crawlers...

cheers,

Dan

Received on Monday, 2 November 2020 19:11:22 UTC