XML Schema validation and https redirects

W3C's main web site https://www.w3.org/ will soon start to 
redirect all http requests to https. Will this cause issues for 
XML Schema-related resources hosted on www.w3.org?

We announced this intended change a few weeks ago,

[[
W3C’s main web site www.w3.org has been available via https for 
over a decade, but until now we have not been redirecting all 
requests to https as is commonly done on most other sites.

The primary reason for this is that we wanted to avoid causing 
issues for software requesting machine-readable resources from 
www.w3.org such as HTML DTDs, XML Schemas, and namespace 
documents.

We believe enough time has passed for most such software to have 
been updated to handle redirects and https, so we are planning to 
start redirecting all requests received over http to https within 
a month or two.
]]
-- https://www.w3.org/blog/2022/07/redirecting-to-https-on-www-w3-org/

And following an initial test of this change on August 1 we 
received some feedback that this caused issues with XML Schema 
validation. We are planning a followup test for 3 days starting 
at 14:00 UTC tomorrow, August 18.

Some questions I have:

Is it intended that www.w3.org is in the critical path when 
performing XML Schema validation? Are .xsd files and/or namespace 
documents retrieved each time a validation is done? Are there 
other use cases besides validation that might cause automated 
requests to www.w3.org?

What are the most popular software packages that might be making 
these requests to www.w3.org? In what contexts do they make these 
requests? Do the latest versions typically have the ability to 
follow http to https redirects? Would XML catalogs help?

The top UAs making requests for .xsd resources on www.w3.org are:

   127574 Java/1.8.0_121
    96712
    25860 Python-urllib/2.7
    16673 Apache-CXF/3.3.4
    16215 Zeep/4.1.0 (www.python-zeep.org)
     6481 Apache-CXF/3.2.10
     6205 Java/1.6.0_26
     4176 Java/17.0.2
     1827 Java/1.8.0_162
     1485 Python-urllib/3.7

(1st col is the number of requests in a 90-min sample of the logs)

Omitting version numbers:

   159765 Java
   101314
    29012 Python-urllib
    27912 Apache-CXF
    17640 Zeep
     1467 Mozilla
      623 Apache CXF
      322 sax Java
      211 Apache-HttpClient
      187 Oracle HTTPClient Version 10h
      120 node-soap
       88 SOA Model (see http:
       87 Elastic-Heartbeat
       74 python-requests
       74 curl

Top UAs making requests matching /2001/XMLSchema :

    43290 Java
    15014 Python-urllib
     8358
     6106 ALTOVA
     3427 Mozilla
      364 Go-http-client
      130 Java1.8.0_291
       88 Zabbix
       70 WebexTeams
       66 MVision
       53 curl
       44 Baiduspider+(+http:
       42 Apache-HttpClient
       40 MapForce
       40 cubebot

If we start redirecting http to https, will that fundamentally 
break compliance with W3C RECs that specify http: in references 
to .xsd files and namespaces? If so, which URIs would we need to 
continue to serve via http?

Thanks,

-- 
Gerald Oskoboiny <gerald@w3.org>
http://www.w3.org/People/Gerald/
tel:+1-604-906-1232 (mobile)

Received on Wednesday, 17 August 2022 22:45:34 UTC