W3C home > Mailing lists > Public > www-validator-cvs@w3.org > November 2015

[Bug 29291] New: robots.txt on 1 site supposedly blocking some URLs in other sites

From: <bugzilla@jessica.w3.org>
Date: Sun, 15 Nov 2015 02:52:10 +0000
To: www-validator-cvs@w3.org
Message-ID: <bug-29291-169@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=29291

            Bug ID: 29291
           Summary: robots.txt on 1 site supposedly blocking some URLs in
                    other sites
           Product: Validator
           Version: HEAD
          Hardware: PC
               URL: http://cold32.com
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: check
          Assignee: dave.null@w3.org
          Reporter: Nick_Levinson@yahoo.com
        QA Contact: www-validator-cvs@w3.org
                CC: dave.null@w3.org, www-validator-cvs@w3.org
  Target Milestone: ---

The W3C Link Checker, when I entered <http://cold32.com>, allowed 10 levels of
recursion (more than needed), and set it to send the Referer, didn't check a
small percentage of links because it supposedly was blocked by <robots.txt>,
but I saw no block in http://cold32.com/robots.txt and don't know why any other
robots.txt file would control this:

from <http://cold32.com/4/clothing-and-hair/2/where-to-buy-coats.htm>:
http://www.gutenberg.org/cache/epub/7213/pg7213.txt

from <http://cold32.com/5/action/6/showers-but-not-heaters.htm>:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2094925/

from probably most *.htm and *.html pages:
http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js

For this bug report, I guessed the component and the version; the version is
actually 4.81.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
Received on Sunday, 15 November 2015 02:52:21 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 15 November 2015 02:52:21 UTC