- From: Dan Brickley <danbri@google.com>
- Date: Mon, 2 Nov 2020 19:10:47 +0000
- To: public-bioschemas@w3.org
Received on Monday, 2 November 2020 19:11:22 UTC
Just a quick note to encourage discussion of robots.txt <https://en.wikipedia.org/wiki/Robots_exclusion_standard> and sitemap <https://en.wikipedia.org/wiki/Sitemaps> files as something that bioschemas implementers should think about. There are a few cases of bioschemas-publishing sites excluding most crawlers via a very restrictive robots.txt file. Similarly, sitemap files can make large and complex sites easier for crawlers (whether simple code or large/commercial) to collect data from efficiently, including URL discovery. Since the hope has always been that bioschemas will encourage innovative uses of marked up data, it seems worth making sure that sites aren't accidentally excluding bioschema-crawlers... cheers, Dan
Received on Monday, 2 November 2020 19:11:22 UTC