- From: James R. Haigh (+ML.W3C.Validator subaddress) <JRHaigh+ML.W3C.Validator@Runbox.com>
- Date: Mon, 12 Aug 2024 17:58:50 +0100
- To: Chuck Houpt <chuck@habilis.net>, www-validator@w3.org
- Message-ID: <20240812173518.177a4a1c@jrhaighs-debian-x200>
Hi Chuck, hi all, At Z-0400=2024-08-12Mon08:32:25, Chuck Houpt sent: > > What needs to be done to fix this problem? What is the Validator now using JavaScript for? And how can we make this usage of JavaScript /optional/ again? > > I'm going to guess that the issue isn't the Validator itself, but its web-host/CDN Cloudflare. A few years back, the Validator switched to using Cloudflare to protect it from DDoS attacks and over-zealous checker scripts. Ahah! Right! :-D That explains a lot, because I am already aware that I am unable to access any site or service running behind Cloudflare -- at least that now means that we now know exactly what the problem is, and that it may not be an issue specific to any code that the W3C has written. :-) > When Cloudflare thinks there's "too much" traffic from one source, it will put up a CAPTCHA challenge page. Unfortunately, the challenge page usually requires Javascript as well as other popular-browser features. Thus text-browsers often can't complete the challenge. If it is due to traffic, it is not from me. I can typically use a WAN connection as slow as 56kb/s dialup or 2G/EDGE almost without congestion, my traffic is so low. I wonder whether actually the JavaScript runs every time and skips the CAPTCHA if it is not triggered, therefore not working in all situations, even if not triggered. Why, though, does the HTML itself need to be on a CDN? The HTML is dynamically-generated, so surely it does not make much sense to have it on a CDN. If it were only the images that were on a CDN then this would not impact TUI browsers, yet would still offer a great deal of protection seeing as it is the images and other multimedia that causes most of the load for servers, and thus presents the majority of the DDOS risk. Also, it is true that most CAPTCHAs of the past few years have become so heavy with JavaScript as to not even work in my "fallback" browser that does have JavaScript supported and enabled: Uzbl. It is a growing problem of exclusion that I am still wondering how to effectively address. > I'm currently able to use the Validator with a text-browser from a pedestrian consumer IP address. I image I'd be blocked by a challenge page if I accessed it from a high-traffic network, like a VPN, datacenter, etc. Well, I am currently using a consumer ISP, so I don't think that is the problem. I think Cloudflare takes a disliking to the simple user-agents that I use. If I do not appear to them as a "human" such as Google Chrome, for example, then I am excluded; the user-agents that I use are not as "human-like" as Google Chrome, and therefore most of the time I do find myself excluded. It is a problem that I have now raised with my new local MP, as it has been going on now for a number of years without any indication of how to address the exclusion in an effective manner. For the record, if I visit a Cloudflare site with a raw Telnet client and manually type "GET / HTTP/1.0" (<Enter><Enter>), I bet it won't pass me as being human -- which I find very ironic, lol! ;-) Let me show you... Firstly, here it is working at FrogFind (I have quoted the replies from the server (using the standard plaintext email quotation "> ") to highlight and distinguish the lines that I have not typed):- $ telnet FrogFind.com 80 > Trying 64.227.13.248... > Connected to FrogFind.com. > Escape character is '^]'. GET / HTTP/1.0 > HTTP/1.1 200 OK > Server: nginx > Date: Mon, 12 Aug 2024 15:01:04 GMT > Content-Type: text/html; charset=UTF-8 > Connection: close > Vary: Accept-Encoding > Expires: Thu, 19 Nov 1981 08:52:00 GMT > Cache-Control: no-store, no-cache, must-revalidate > Pragma: no-cache > Set-Cookie: PHPSESSID=934oobtq6ff7p67b2nq6qca1s0; path=/ > Vary: Accept-Encoding > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 2.0//EN"> > <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> > > <html> > <head> > <title>FrogFind!</title> > </head> > <body> > > <br><br><center><h1><font size=7><font color="#008000">Frog</font>Find!</font></h1></center> > <center><img src="/img/frogfind.gif" width="174" height="80" alt="a pixelated cartoon graphic of a fat, lazy, unamused frog with a keyboard in front of them, awaiting your search query"></center> > <center><h3>The Search Engine for Vintage Computers</h3></center> > <br><br> > <center> > <form action="/" method="get"> > Leap to: <input type="text" size="30" name="q"><br> > <input type="submit" value="Ribbbit!"> > </center> > <br><br><br> > <small><center>Built by <b><a href="https://youtube.com/ActionRetro" target="_blank" rel="noopener">Action Retro</a></b> on YouTube | Logo by <b><a href="https://www.youtube.com/mac84" target="_blank" rel="noopener">Mac84</a></b> | <a href="about.php">Why build such a thing?</a></center><br> > <small><center>Powered by DuckDuckGo</center></small> > <small><center>v1.2</center></small> > </form> > </form> > > > </body> > </html>Connection closed by foreign host. Now at the Nu Validator, actually Cloudflare:- $ telnet Validator.W3.org 80 > Trying 104.18.22.19... > Connected to Validator.W3.org. > Escape character is '^]'. GET /nu/ HTTP/1.0 > HTTP/1.1 400 Bad Request > Date: Mon, 12 Aug 2024 15:12:19 GMT > Content-Type: text/html > Content-Length: 155 > Connection: close > Server: cloudflare > CF-RAY: 8b2167216ab060fe-LHR > > <html> > <head><title>400 Bad Request</title></head> > <body> > <center><h1>400 Bad Request</h1></center> > <hr><center>cloudflare</center> > </body> > </html> > Connection closed by foreign host. $ telnet Validator.W3.org 80 > Trying 104.18.22.19... > Connected to Validator.W3.org. > Escape character is '^]'. GET /nu/ HTTP/1.1 HOST: Validator.W3.org > HTTP/1.1 301 Moved Permanently > Date: Mon, 12 Aug 2024 15:21:38 GMT > Content-Type: text/html > Content-Length: 167 > Connection: keep-alive > Cache-Control: max-age=3600 > Expires: Mon, 12 Aug 2024 16:21:38 GMT > Location: https://validator.w3.org/nu/ > Set-Cookie: __cf_bm=Nrs0JyqxQblW8GWcsOyy1xVlxZRg6MiX8kVVjvw1vq0-1723476098-1.0.1.1-p4hGJcknnBGe3DsJBot7d0enNBg2ni6Q5iy488HUwd3gQg_8wOvjI6fjMmO_VmMoy0nAhVp3ySgX00GGCLozTA; path=/; expires=Mon, 12-Aug-24 15:51:38 GMT; domain=.w3.org; HttpOnly > Server: cloudflare > CF-RAY: 8b2174cfed65369a-LHR > alt-svc: h3=":443"; ma=86400 > > <html> > <head><title>301 Moved Permanently</title></head> > <body> > <center><h1>301 Moved Permanently</h1></center> > <hr><center>cloudflare</center> > </body> > </html> ^] telnet> exit > ?Invalid command telnet> quit > Connection closed. Okay, so the "Location: https:"... and the “alt-svc: h3=":443"”... indicate to me that the Cloudflare server wants me to speak SSL before it even decides whether I am human or not, and I have not learnt to speak SSL yet, so I do not know how to continue the conversation with the Cloudflare server, but it is clear to me that if I was able to get a bit further it would reject me as not being human, which I find completely bonkers, like! X-D Seeing as this is about as human as you can get when interacting with such advanced machines as Web servers! ;-) I think most modern CAPTCHAs would not recognise a real human if they saw one, and, although raw Telnet is not my preference over TUI browsers such as W3M, simple, adaptable user-agents such as W3M are much closer to this hypothetical example of using raw Telnet than they are to complicated and somewhat opaque browsers such as Google Chrome, Safari, IE/Edge, or modern Mozilla Firefox. Actually, I don't find the human user-agent of raw Telnet that hypothetical, come to think of it -- I prefer raw Telnet to all 4 of the most dominant browsers any day! Not even joking, like. Anyway, so, now that we know that Cloudflare is the issue, here -- what can be done about it? Can its scope be limited to only multimedia? What about if Cloudflare was only ever used for a mirror, fallen back to if/when a DDOS attack does happen? Also, it strikes me as absurd that the WWW has proliferated CDNs while neglecting to consider swarming technologies that would make the DDOS problem, and also the problem of legitimate demand surges, both completely go away for good. If each visitor to a site was to help the next visitor by the secure swarming techniques pioneered by BitTorrent, a small independent site on a tiny server could support any number of visitors without breaking a sweat, like. What a great technology that is?! We should be doing more things like that! :-) The Web would be a much better place if we did. It would be more like the WWW that TBL used to talk about. :-) Kind regards, James. P.s.: If both versions of the Validator are at Validator.W3.org , then why does Cloudflare only obstruct the Nu version? -- Sent from Debian with Claws Mail, using email subaddressing as an alternative to error-prone heuristical spam filtering.
Received on Tuesday, 13 August 2024 15:13:49 UTC