Re: XML Schema validation and https redirects from C. M. Sperberg-McQueen on 2022-08-23 (xmlschema-dev@w3.org from August 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 23 Aug 2022 12:55:19 -0600
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
Cc: Michael Kay <mike@saxonica.com>, Gerald Oskoboiny <gerald@w3.org>, Norm Tovey-Walsh <ndw@nwalsh.com>, xmlschema-dev@w3.org
Message-ID: <87o7wa93hw.fsf@blackmesatech.com>

"Henry S. Thompson" <ht@inf.ed.ac.uk> writes:

> Michael Kay writes:
>
>> Well I certainly think that if a web site owner decides to move a
>> frequently requested resource to a different URI, the least it
>> should do is update its internal links to that resource to use the
>> new URI.
>
> Hmm.  To take my favourite example, I would argue that the primary
> purpose of the namespace URI for XHTML, http://www.w3.org/1999/xhtml,
> is to identify documents as conforming to the XHTML spec.  It is
> entirely reasonable to bake exactly that sequence of ASCII characters
> into your software when you need to detect XHTML.  And I _don't_ think
> we should invalidate such software, by changing the XHTML spec. to use
> https.

True.

But for what it's worth, I took MK to be referring to "resources" in the
narrow sense of DTDs, schema documents, and other electronic objects,
and not necessarily to abstract objects like namespaces.

So if W3C does decide to update its own pointers to specific resources,
the ones I would recommend be changed are the pointers to specific
schema documents, especially those pointing from 'schemaLocation'
attributes in one schema document to another schema document using a
w3.org URL.  The only one of those I found in looking at the XSD schema
for schemas was the pointer to the XML namespace.

Or if W3C decides to make exceptions for (some) schema documents, to
avoid having to update the schemaLocation attributes (or the pointers to
schema documents from HTML documents, like the one Norm Tovey-Walsh
pointed to), it's the schema documents that need to be taken care of,
not the namespace URIs.

Since it appears that the schema for schema documents is being hit by
validators (why?! what were they thinking?), changing W3C's own pointers
to that schema document is not going to suffice to eliminate the
problems people are reporting:  for that, either the validators are
going to have to update the hard-coded URIs they use to dereference the
schema for schemas and the schema for the XML namespace, or they are
going to have to update their policy on HTTP 301s to follow them (you
trust the server in question to serve you an authentic schema, but you
don't trust them to tell you where it can now be retrieved?  What were
you thinking?! -- I am beginning to think that what we should have done
when we put those schemas in place is set things up so that on any given
day there was a 15% chance that the server would return a 301 on
them, or a 404, to encourage processors to cache them in the first
place -- isn't hindsight wonderful?), or W3C is going to have to
continue serving them using http.

I have a certain sympathy for MK's point, but while I think it's
reasonable to suggest that any newly published or re-published pointers
to resources use the URI current at the time the pointer is published, I
don't think it's reasonable to expect that every hypertext reference in
a relatively large body of published material be changed.  HTTP was
designed from the beginning to deal with imperfection in the network,
including network failures and changes of URI (even if cool URIs don't
change).  And what is a 301, if not an authoritative statement about the
current URI for the resource?

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Tuesday, 23 August 2022 19:14:37 UTC