Re: IO Error: HTTP resource not retrievable. The HTTP status from the remote server was: 401 from Michael[tm] Smith on 2015-07-30 (www-validator@w3.org from July 2015)

From: Michael[tm] Smith <mike@w3.org>
Date: Thu, 30 Jul 2015 09:41:19 +0900
To: Bob <bob33cn@comcast.net>
Cc: www-validator@w3.org
Message-ID: <20150730004119.GB963@sideshowbarker.net>

Hi Bob,

Bob <bob33cn@comcast.net>, 2015-07-29 13:59 -0400:
> Archived-At: <http://www.w3.org/mid/55B9146B.8070207@comcast.net>
> 
> I'm getting:
> 
> 1. *IO Error*: HTTP resource not retrievable. The HTTP status from the
>    remote server was: 401.
> 
> The webpage being tested is protected by username/password that I've
> entered. I've used the validator 1-2 weeks ago on the same domain without
> any problem.
> 
> Any suggestions?
> 
> Is this a version (Version: 15.7.26) issue?

The fact that no longer works is by design. The HTML Checker does not
support doing that.

The legacy validator supports that mechanism, by which you can give it a
username and password to do HTTP authentication for a URL at a different
domain (wherever the document you’re checking is hosted).

But that is a very bad idea. Among other reasons, it gives users a false
sense of security in that it looks like they’re just logging into that
other domain in the same way you normally would.

But you’re not. Instead the legacy validator is MITMing you. It’s basically
phishing your credentials. It’s trying to train you that it’s OK and safe
to get phished.

Try going to https://validator.w3.org/check?uri=http%3A%2F%2Ftest.webdav.org%2Fauth-basic%2F
and press the Cancel button in the HTTP authentication prompt, and read
what it says there, especially the following parts:

> You should have been prompted by your browser for a username/password
> pair; if you had supplied this information, I would have forwarded it to
> your server for authorization to access the resource.
>
> You should also be aware that the way we proxy this authentication
> information defeats the normal working of HTTP Authentication. If you
> authenticate to server A, your browser may keep sending the
> authentication information to us every time you validate a page,
> regardless of what server it's on, and we'll happily pass that on to the
> server thereby making it possible for a malicious server operator to
> capture your credentials.

You should never give a password to any third-party site on the Web like
that, ever—No matter how much you trust them not to do anything malicious.

If you choose to take a risk like that, then you might as well just go all
the way and put the username and password into the URL you give to the
checker, like this:

  https://your_username:your_password@example.com

You can give a URL like that to https://validator.w3.org/nu/ right now and
it will work. But I really don’t recommend actually doing that, because it’s
not secure. But at least it’s more clear to you that you’re doing something
insecure, as opposed to what the legacy validator is doing, which is giving
a misleading illusion of security.

It would be less insecure if you ran your own local copy of the checker for
checking sites that require login credentials. You could do that like this:

1. Download the latest release of the validator vnu.jar file from
   https://github.com/validator/validator/releases/latest) or from
   https://sideshowbarker.net/releases/jar/.

2. Run your own local instance of the HTML Checker
   https://validator.github.io/validator/#standalone and give it a
   URL in the form https://your_username:your_password@example.com

3. or, Use the vnu.jar tool from the command-line
   https://validator.github.io/validator/#usage like this:

     java -jar ~/vnu.jar https://your_username:your_password@example.com

-- 
Michael[tm] Smith https://people.w3.org/mike

Received on Thursday, 30 July 2015 00:41:44 UTC