Re: XML Schema validation and https redirects

Gerald Oskoboiny <gerald@w3.org> writes:

> W3C's main web site https://www.w3.org/ will soon start to redirect
> all http requests to https. Will this cause issues for XML
> Schema-related resources hosted on www.w3.org?

To this top-level question I have no reliable answer.  It SHOULD not
cause major issues; it probably WILL cause at least some issues, just
because it's so easy to put things like this off until something
actually breaks.

I may be able to answer some of the questions of detail.

One complication is that it's not clear how many of the schema
validators in current use are actively maintained; if the change you are
contemplating breaks validation in some tools, it may take a while
before people figure out how to push an update.

> We announced this intended change a few weeks ago, ...

> Some questions I have:

> Is it intended that www.w3.org is in the critical path when performing
> XML Schema validation?

Yes and no, at least in my reading of the XSD spec and my recollections
of the WG discussions.

Yes in the sense that the XSD spec and other W3C specs for which XSD and
other schemas have been defined normally use dereferenceable URIs with
www.w3.org as host to name the namespaces they define, and the explicit
motivation for that usage was and is that it should be possible to
retrieve information about such a namespace by dereferencing its name.

No in the sense that the XSD spec is explicit that it is not required
that a fresh copy of a schema document be retrieved from the host named
in the namespace name.  Several alternative methods of locating XSD
schema documents are described in the spec.

  - Schema validators may have hard-coded knowledge of the schema
    they are built to work with.

    As a special case of this, knowledge of the XSD schema can be (and I
    expect probably is) built in to most schema validators, so that they
    don't have any pressing need to fetch a copy of the XSD schema for
    XSD schemas.

  - Schema documents are resources on the Web, to be dereferenced like
    any other resource, and no single strategy for retrieving them will
    work in all cases.  Section 4.3.1 of XSD 1.1 Part 1 [1] says in part

        Note: The variations among server software and web site
        administration policies make it difficult to recommend any
        particular approach to retrieval requests intended to retrieve
        serialized ·schema documents·. An Accept header of
        application/xml, text/xml; q=0.9, */* is perhaps a reasonable
        starting point.

    [1] https://www.w3.org/TR/xmlschema11-1/#schema-repr

    As a special case of this, XSD schemas for any namespace may be
    cached by a local server or by a schema validator.  It is also
    allowed for user-controlled caching (XML catalogs) to be used to
    point to local copies of XSD schema documents, but I do not know how
    widespread support for XML catalogs is among XSD processors.
    
  - One obvious approach to finding a schema for a particular namespace
    is to dereference the namespace name; this may or may not produce a
    schema.  Section 4.3.2 of XSD 1.1 Part 1 [2] says in part

         it is possible but not guaranteed that a schema is retrievable
         via the namespace name. Accordingly whether a processor's
         default behavior is or is not to attempt such dereferencing, it
         must always provide for user-directed overriding of that
         default.

    [2] https://www.w3.org/TR/xmlschema11-1/#schema-loc

  - The user of a schema validation engine can provide a URI at which
    a suitable schema document can be found; this is formally a hint and
    processors are not obligated to attempt to dereference that URI.

    However, the schemaLocation information provided in a schema
    document when importing or including other schema documents is
    binding on the processor and not a hint.  (More on this below.)

> Are .xsd files and/or namespace documents
> retrieved each time a validation is done?

It would not surprise me if some validators operate that way; it is not
required by the XSD spec.

> Are there other use cases
> besides validation that might cause automated requests to www.w3.org?

Not common ones (at least, that I know of).

> What are the most popular software packages that might be making these
> requests to www.w3.org? In what contexts do they make these requests?
> Do the latest versions typically have the ability to follow http to
> https redirects? Would XML catalogs help?

I can't help you there.

> If we start redirecting http to https, will that fundamentally break
> compliance with W3C RECs that specify http: in references to .xsd
> files and namespaces? If so, which URIs would we need to continue to
> serve via http?

As far as I know, no spec that came out of the XML Activity ever
requires namespace names to be dereferenced as a condition of
conformance for any operation, so with respect to namespace names, the
change you describe won't break conformance in any way that I can see.

With respect to XSD schema documents and the XSD spec, there is one
situation in which conformance may be held to require an attempt to
dereference an http URI:  namely, when a schema document refers, on an
import or include or similar statement, an http URI, the spec says (as I
read it) that the processor should fetch that schema document, which
will normally happen by dereferencing that URI.

The authoritative schema for XSD schema documents is currently hosted at

   http[s]://www.w3.org/2001/XMLSchema.xsd

and imports the schema for the XML namespace

   http://www.w3.org/XML/1998/namespace

by pointing to

   http://www.w3.org/2001/xml.xsd

so I believe that conforming processors who haven't cached that document
will continue hitting the http URI indefinitely.

You could arrange to update the schemaLocation value in the XSD schema
for schemas to use https, but that won't change the URI in any cached
copies of the schema for schemas.  It may possibly be helpful to
continue to serve that schema document with http, but I do not believe
this is a condition of conformance.

Nothing in the spec says that it is non-conforming to follow a redirect,
or for retrieval to fail.

That doesn't mean you won't get complaints, but if I were you I would
point them to the final paragraph of section 4.3.2 of XSD 1.1 Part 1:

    Improved or alternative conventions for Web interoperability can be
    standardized in the future without reopening this specification. For
    example, the W3C is currently considering initiatives to standardize
    the packaging of resources relating to particular documents and/or
    namespaces: this would be an addition to the mechanisms described
    here for layer 3. This architecture also facilitates innovation at
    layer 2: for example, it would be possible in the future to define
    an additional standard for the representation of schema components
    which allowed e.g. type definitions to be specified piece by piece,
    rather than all at once.

The bottom-line meaning of that paragraph, as I understand it, is:  the
Web is a growing and changing system, and how you retrieve schemas may
have to change to align with the Web. No conformance requirement in the
XSD spec requires the Web to stop growing or changing.


> Thanks,

Thank you for your inquiry.  I hope this helps.  And good luck.

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Thursday, 18 August 2022 01:31:42 UTC