Re: XML Schema validation and https redirects

I think it's rather unlikely that Xerces goes to the W3C site every time you do an XSD validation. I suspect it only does so when you use a schema that's referenced via a W3C location URI - for example, the xml.xsd schema. Where is it documented? Well, it's probably just assumed that people know that if they reference a resource using a particular URL, it's going to result in a request to that site, unless otherwise specified.

It's many years since W3C started throttling such requests, and I'm sure that had an impact. However, so many people now are using mature software that has hardly been updated since the early days of XML, and unless (a) the users of such software realise why it's become so slow, and (b) can persuade their vendors to do something about it, it's not going to have any effect.

What I don't understand is why ISPs or local proxy servers aren't caching such resources. But then, I'm like everyone else - I use a lot of technology that I don't fully understand.

Michael Kay
Saxonica

> On 20 Aug 2022, at 11:59, Greg Hunt <greg@firmansyah.com> wrote:
> 
> Michael, Norm,
> Given that more than half of the XML Schema hits identified at the beginning of this thread are likely to come from Xerces 2.11, how are people supposed to know what the behaviour of that software is?  Is it documented somewhere in the Xerces documentation that the w3c site will be hit when you pass XMLConstants.W3C_XML_SCHEMA_NS_URI to the schema factory?  I can't see it in the Xerces website.  Now, we obviously know that its discussed elsewhere, but is it in the software documentation?
> 
> Greg
> 
> On Sat, 20 Aug 2022 at 20:19, Michael Kay <mike@saxonica.com <mailto:mike@saxonica.com>> wrote:
> Excellent post, Norm.
> 
> The bottom line is that the easier we make it to assemble systems from reusable components, the more likely it becomes that the people who build the systems have only a shallow understanding of the technology they are putting together.
> 
> Roger Needham used to say that the reason we need to provide unimaginable amounts of bandwidth is to reduce the need for people to exercise their brains.
> 
> Michael Kay
> Saxonica
> 
> > On 20 Aug 2022, at 09:27, Norm Tovey-Walsh <ndw@nwalsh.com <mailto:ndw@nwalsh.com>> wrote:
> > 
> >> During this round of testing we heard from (among others) a casino and
> >> an insurance company saying their production services were impacted by
> >> this change. Why would these companies *want* their production
> >> services to be dependent on the availability of a web site run by a
> >> small nonprofit they likely never even heard of?
> > 
> > They don’t want it that way, but that’s what they got from the person
> > who coded up the solution using bits cobbled together from the
> > documentation and Stackoverflow.
> > 
> > The problem is in some sense intractable. Putting the resources, like
> > schemas, on the web in machine readable form is very alluring. Of
> > *course* they should be on the web, how else is anyone going to get
> > them? Asking developers, especially early adopters, to cut-and-paste
> > them out of documentation because the publisher refuses to make the
> > machine readable form available is absurd.
> > 
> > But once they’re on the web in machine readable form, it’s easy to just
> > point to them. If your developers work in environments with fast
> > internet connections, it’s hardly noticable that you’re pinging a web
> > server to get a schema that you *could* copy locally (1) if you knew
> > anything about the technology that javax.xml.validation is providing,
> > which you probably don’t, (2) if you knew that you were supposed to
> > cache that locally, (3) if you had the time and resources necessary to
> > make a local cache and manage it, (4) etc.
> > 
> > This problem has not been made easier to solve by the fact that systems
> > like Node.js have inured developers to the idea that every system
> > depends on somewhere between a few dozen and a thousand random third
> > party packages downloaded from the internet and used without
> > understanding or inspection. (I live in this glass house, I’m not
> > throwing stones.)
> > 
> > On the one hand, I’m horrified that casinos and insurance companies and
> > other production systems have hard coded dependencies on the ability to
> > download a schema from www.w3.org <http://www.w3.org/> via http:. On the other hand, I’m not
> > the least bit surprised, how could it be any other way?
> > 
> >> These experiments with redirecting the whole site to https are really
> >> just an exploration into whether this is feasible at all, and if not,
> >> which resource(s) we need to continue to serve via HTTP. But making
> >> exceptions would just add to the already huge pile of technical debt
> >> that has accumulated after decades of not throwing things away.
> > 
> > Indeed. The job you have is hard, both technically and organizationally.
> > I offer my profound thanks and gratitude for the hard work that you and
> > your team are doing. The W3C has done an absolutely admirable job of
> > managing a huge set of resources over several decades without randomly
> > breaking things or throwing things away. Thank you!
> > 
> >                                        Be seeing you,
> >                                          norm
> > 
> > --
> > Norman Tovey-Walsh <ndw@nwalsh.com <mailto:ndw@nwalsh.com>>
> > https://nwalsh.com/ <https://nwalsh.com/>
> > 
> >> Formal symbolic representation of qualitative entities is doomed to its
> >> rightful place of minor significance in a world where flowers and
> >> beautiful women abound.--Albert Einstein
> 
> 

Received on Saturday, 20 August 2022 13:56:09 UTC