Re: Fwd: XML Schema validation and https redirects

Norm Tovey-Walsh <ndw@nwalsh.com> writes:

>> From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
>> Subject: Re: XML Schema validation and https redirects
>> Date: 18 August 2022 at 01:12:54 BST
>> To: Gerald Oskoboiny <gerald@w3.org>
>> Cc: xmlschema-dev@w3.org
>> Resent-From: xmlschema-dev@w3.org

>> Gerald Oskoboiny <gerald@w3.org> writes:

>> W3C's main web site https://www.w3.org/ will soon start to redirect
>> all http requests to https. Will this cause issues for XML
>> Schema-related resources hosted on www.w3.org?

> Like Micheal (Sperberg-McQueen), I’m inclined to hedge my bets. What’s
> actually going to happen?

I like Norm's analysis of the possible outcomes; I have comments on a
couple of them.

> ...

>   5. If the validator didn’t report an error because it failed to get an
>      XSD file, then it’ll proceed without the schema document. That
>      probably won’t work, but it’s a bit hard to predict how it’ll fail.

For what it's worth, the XSD spec does explicitly say that it's not a
validation error if a schema document cannot be retrieved.  What should
normally happen in that case is that the schema used for validation will
lack declarations for some elements and attributes, which means in turn
that errors in those elements and attributes will not be detected (so
validation will be looser than expected), and the elements and
attributes will be marked as having unknown validity.  In principle,
this should cause a warning flag of some kind for downstream consumers
expecting to see valid input, but in practice many validators and users
appear to take the absence of error messages as meaning the input is
valid, failing to distinguish between validity="valid" and
validity="notKnown".

>   4d. If the API returns the schema document with the https: URI as the
>       system identifier, then…
>
>     ...
>
>     5b. If the validator looks at the system identifier, I suppose some
>         part of the validator might decide that https:// doesn’t match
>         http:// and conclude that it has the wrong namespace.

Anything is possible, of course, but it should be pointed out that there
is no justification in the XSD spec for behavior 5b in an XSD validator.

Behavior 5b might be plausible for an automated tool dereferencing a
namespace URI.  But the XSD spec is explicit that schema documents for a
given namespace may reside anywhere.

> I have no real intuition about how likely 5b is. My wild guess is “not
> very likely” because once you’ve got the schema, you’re probably more
> concerned about what targetNamespace it claims to validate than what its
> URI was.

That is also my guess.  It is certainly what I think the XSD spec
suggests.

> I saw this one in the wild within the last year: (Some of the) XSD for
> XSD Schemas have a doctype declaration, for example this one:
>
>   http://www.w3.org/2001/XMLSchema.xsd
>
> I discovered some bit of software, I forget the exact details, that had
> a cached copy of the XSD but not the DTD so parsing the cached XSD made
> a DTD request to www.w3.org every time…

Yow!  Excellent point.  

> Yes, XML catalogs help. They allow the application author and/or user to
> configure local resources that can be returned automatically when
> attempts are made to retrieve documents over the web.

Hear, hear.

The XSD spec does not explicitly mention XML catalogs, but I read its
discussion of how schema documents are to be found on the Web as
compatible with catalogs and similar measures.

Michael


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Thursday, 18 August 2022 17:12:11 UTC