Re: XSD validation, ambiguous root XML instance element

On 3,Nov2020, at 3:09 AM, Henry S. Thompson <ht@inf.ed.ac.uk> wrote:
> 
> Michael Kay writes:
> 
>> Basically, the problem is that XSD doesn't define constraints on a
>> document, it only defines constraints on an element. I've never
>> understood entirely why that mind-set exists, but it seems to be
>> strongly held by some.
>> 
>> xs:ID/IDREF validation is of course an exception, and I think that
>> only got in because they wanted XSD to be a functional replacement for
>> DTDs.
> 
> At this remove in time, I certainly don't remember why we didn't add a
> document-level constraint to XSD.  But I would note that insofar as
> there was a mind-set, it was certainly heavily influenced by the idea
> that, as you said, XSD was meant to do more-or-less what DTDs did, using
> XML as its notation.  

Interesting that memories vary so much.  My recollection is that XSD was
intended to be able to do everything DTDs could do, but not that it was
to be limited to those things.  The type system, among other parts of XSD,
goes well beyond doing more or less what DTDs do.

One of the pressures against document-level constraints was the desire
to make validation against a schema independent of context:  an element E
valid against an element declaration D should remain valid against D even
if removed from its current context or inserted into a new one.  This 
goes against the kind of context-awareness I think of as usual in document
processing applications and exhibited by the attribute-value inheritance
specified in the XML spec for xml:lang and other attributes, but it was
apparently highly desired by the XML Query WG.  Proposals were
made to make XSD deal well (or at least better) with attribute value 
inheritance and constraints like the rule in HTML that in ‘input’ element 
should be valid only as a descendant of a ‘form’ element, but they were
resisted strongly by those who saw no problem in W3C formulating a
schema language in which important rules of HTML could not be 
expressed, and the proposals were unsuccessful.


> And DTDs don't let you specify the root element
> either.

At the risk of pedantry, it should be pointed out that whether this is true or
not depends on whether by “DTD” one means the document type declaration
and the material included in it physically or logically, or only the material included
in it, without the document type declaration itself.

The document type declaration does, of course, specify the root element.

> 
> Why _that_ was the case I have no idea.

In ISO 8879, the document type declaration may be physically in the same
entity as the beginning of the document’s outermost element, or physically
separate.  Some SGML software expected the association of documents with
document type declarations to be dynamic, specified at the time of processing,
while other software expected the association to be static.  

In the one case, a file containing a set of declarations for elements,
attributes, etc. was best enclosed in a document type declaration.  That
file would normally be separate from any particular document, and in 
consequence the distinction between the ‘internal’ DTD subset enclosed
in the document type declaration and the ‘external’ subset named by the
public and system identifiers in the document type declaration seemed
unnecessary.  It was not needed for modularity, since parameter entities 
could be used for that, and struck some people as pointless.

In the other case, the document type declaration would normally be
found at the beginning of an SGML document.  This was convenient, as it
made it unnecessary to specify the document type definition every time one
invoked any SGML software, and it served as a form of in-line documentation
about the document in which it appeared.

The consensus, or at least the majority, view in the XML working group was 
that the flexibility offered by 8879 in this respect was not helpful enough to
be retained, and further that the second approach (in-document document 
type declaration) was the more useful approach.  In this approach, the 
internal DTD subset in any document can contain declarations specific to that
document, and the external collections of declarations can be used for more
generally applicable declarations.


> 
> Nor do I remember why the XML DOCTYPE _does_ specify the root element,
> which is redundant/self-evident 99% of the time.

In 8879, the name is the only required portion of the document type declaration.

> 
> But mentioning DOCTYPE leads on to suggest that one might say the
> 'correct' way to address your requirement is to put the constraint _in
> the document_:
> 
> <!DOCTYPE X>
> <?xml version="1.0"?>
> <X>
> ...
> </X>

Yes, and no, I think.

This will have an effect only if the processor performs partial validation
against the partial DTD; it will however in no case affect the result
of schema-validity assessment.

And if I have understood the requirement, it is that the schema, not the
document, specify the acceptable outermost elements.

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************

Received on Wednesday, 4 November 2020 19:28:06 UTC