Robots.txt and Sitemap files

Just a quick note to encourage discussion of robots.txt
<https://en.wikipedia.org/wiki/Robots_exclusion_standard> and sitemap
<https://en.wikipedia.org/wiki/Sitemaps> files as something that bioschemas
implementers should think about. There are a few cases of
bioschemas-publishing sites excluding most crawlers via a very restrictive
robots.txt file. Similarly, sitemap files can make large and complex sites
easier for crawlers (whether simple code or large/commercial) to collect
data from efficiently, including URL discovery. Since the hope has always
been that bioschemas will encourage innovative uses of marked up data, it
seems worth making sure that sites aren't accidentally excluding
bioschema-crawlers...

cheers,

Dan

Received on Monday, 2 November 2020 19:11:22 UTC