Re: XML Schema validation and https redirects

* Norm Tovey-Walsh <ndw@nwalsh.com> [2022-08-20 09:27+0100]
>> During this round of testing we heard from (among others) a casino and
>> an insurance company saying their production services were impacted by
>> this change. Why would these companies *want* their production
>> services to be dependent on the availability of a web site run by a
>> small nonprofit they likely never even heard of?
>
>They don’t want it that way, but that’s what they got from the person
>who coded up the solution using bits cobbled together from the
>documentation and Stackoverflow.
>
>The problem is in some sense intractable. Putting the resources, like
>schemas, on the web in machine readable form is very alluring. Of
>*course* they should be on the web, how else is anyone going to get
>them? Asking developers, especially early adopters, to cut-and-paste
>them out of documentation because the publisher refuses to make the
>machine readable form available is absurd.
>
>But once they’re on the web in machine readable form, it’s easy to just
>point to them. If your developers work in environments with fast
>internet connections, it’s hardly noticable that you’re pinging a web
>server to get a schema that you *could* copy locally (1) if you knew
>anything about the technology that javax.xml.validation is providing,
>which you probably don’t, (2) if you knew that you were supposed to
>cache that locally, (3) if you had the time and resources necessary to
>make a local cache and manage it, (4) etc.
>
>This problem has not been made easier to solve by the fact that systems
>like Node.js have inured developers to the idea that every system
>depends on somewhere between a few dozen and a thousand random third
>party packages downloaded from the internet and used without
>understanding or inspection. (I live in this glass house, I’m not
>throwing stones.)
>
>On the one hand, I’m horrified that casinos and insurance companies and
>other production systems have hard coded dependencies on the ability to
>download a schema from www.w3.org via http:. On the other hand, I’m not
>the least bit surprised, how could it be any other way?

Yes. Very well said.

In 2008 we tried to bring some attention to this general issue,
https://www.w3.org/blog/2008/02/w3c_s_excessive_dtd_traffic/

and following that we started serving certain very popular 
resources with artificial 5- or 15-second delays but I wish we 
had done so more systematically, including *.xsd.

Our experiments with redirecting the entire site seem at least 
somewhat successful in helping to understand the extent of this 
issue and bringing it to people's attention, for example this 
article published yesterday:
https://www.theregister.com/2022/08/22/w3cs_transition_https/

I am starting to get a little dubious that we will be able to 
fully redirect the entire site including *.xsd any time soon. 
Yesterday I learned that loc.gov attempted this a couple years 
ago but had to revert for *.xsd as well.
https://github.com/mets/METS-schema/issues/3#issuecomment-659569020

>> These experiments with redirecting the whole site to https are really
>> just an exploration into whether this is feasible at all, and if not,
>> which resource(s) we need to continue to serve via HTTP. But making
>> exceptions would just add to the already huge pile of technical debt
>> that has accumulated after decades of not throwing things away.
>
>Indeed. The job you have is hard, both technically and organizationally.
>I offer my profound thanks and gratitude for the hard work that you and
>your team are doing. The W3C has done an absolutely admirable job of
>managing a huge set of resources over several decades without randomly
>breaking things or throwing things away. Thank you!

Thank you!

-- 
Gerald Oskoboiny <gerald@w3.org>
http://www.w3.org/People/Gerald/
tel:+1-604-906-1232 (mobile)

Received on Wednesday, 24 August 2022 03:17:06 UTC