Re: XML Schema validation and https redirects

Gerald,
Its not a matter of some perverse set of "wants" on the part of others but
of default behaviours that people accept and assume that they will do the
right thing (why doesn't Xerces come with a built in catalog that removes
those network accesses?  No idea and who cares anyway?  Its been static and
embedded in Java for more than ten years so we couldn't get widespread
adoption of a change before some number of Java releases have occurred
anyway).  That combined with the always fairly low level of knowledge of
XML in the development community (my reference to namespaces earlier was an
example of an even more central idea that developers do not want to engage
with, let alone with the detail of what happens when you set an apparently
simple flag saying "please validate this document") means that we have
arrived at a place where change is really hard and its not the fault of the
developers who are on the whole fairly busy just trying to keep stuff
together in the face of continually increasing complexity.

The problem is that implementation standards do entail technical debt and
the moment that a system was implemented where W3C served the schema (over
HTTP) and XML parsers fetched that schema from W3C by default people
started acquiring, however unconsciously, that dependency (to the level of
200k requests per hour it seems).  It is simply not the case that W3C is "a
small nonprofit they likely never even heard of" if it were, we would not
be having this discussion, you would not be seeing 200k hits per hour on
XSDs and your "experiments" on other people's production systems would not
be breaking them.  Further, the people experiencing the pain are in many
cases not the people who wrote the code and a lot of them now regard XML as
some strange antique that ought to be replaced with JSON, an idea I have
some sympathy for except for the obvious value of XML validation, but it
gets harder and harder to continue to argue for XML as time goes by.

Greg

On Sat, 20 Aug 2022 at 06:51, Gerald Oskoboiny <gerald@w3.org> wrote:

> * Norm Tovey-Walsh <ndw@nwalsh.com> [2022-08-19 09:39+0100]
> >Greg Hunt <greg@firmansyah.com> writes:
>
> >> Break the validation, even momentarily, and all you have is a legacy
> >> technology that is harder to argue for.
> >>
> >> I am with Michael on this, publishing stable URIs, (and I am inclined
> >> to factor in the frankly rather vague statements about dereferencing
> >> URLs), constituted a promise to not change things, a promise that you
> >> cannot evade by saying people ought to be reading the W3C blog and
> >> updating their software.
>
> I agree stable URIs are important, and I think W3C has done a
> better job of preserving the stability of its URIs than almost
> any other organization, including orgs with several orders of
> magnitude more resources at their disposal.
>
> >I think those are very reasonable and valid points. On the other hand,
> >configuring software so that it dereferences www.w3.org to do validation
> >of some local resource was probably not an explicit decision, it’s
> >probably an accident. The application is going to fail when www.w3.org
> >falls off the internet, which I’m sure it does periodically when
> >maintenance is performed, or when someone borks DNS on purpose or by
> >mistake.
> >
> >We know that http: URIs are insecure and subject to various kinds of
> >attacks. If someone constructs an attack vector that uses a hacked
> >schema injected into an insecure HTTP stream to get software to accept
> >an otherwise invalid document with some downstream consequence that the
> >black hats can exploit, that’s bad too. If a bit…unlikely.
>
> +1
>
> During this round of testing we heard from (among others) a
> casino and an insurance company saying their production services
> were impacted by this change. Why would these companies *want*
> their production services to be dependent on the availability of
> a web site run by a small nonprofit they likely never even heard
> of?
>
> These experiments with redirecting the whole site to https are
> really just an exploration into whether this is feasible at all,
> and if not, which resource(s) we need to continue to serve via
> HTTP. But making exceptions would just add to the already huge
> pile of technical debt that has accumulated after decades of not
> throwing things away.
>
> --
> Gerald Oskoboiny <gerald@w3.org>
> http://www.w3.org/People/Gerald/
> tel:+1-604-906-1232 (mobile)
>
>

Received on Friday, 19 August 2022 22:24:14 UTC