Re: Why DOCTYPE Declarations for XHTML?

"W. Eliot Kimber" wrote:
> 
> Murray Altheim wrote:
> >
> > "W. Eliot Kimber" wrote:
> > > Murray Altheim wrote:
> > [...]
> > > I'm not objecting to enabling validation. I'm only objecting to the
> > > requirement that conforming XHTML documents must use a particular form
> > > of DOCTYPE declaration (or even have a doctype declaration at all).
> >
> > If we eliminate the requirement on DTD validation, we not only allow
> > for other types of validation, we loosen the conformance requirements
> > to the point where it becomes *less* possible to ascertain that a
> > given instance is XHTML, not more.
> 
> I think this is the disconnect: it's not a question of *determining*
> whether or not a given document conforms to the XHTML spec, it's a
> question of a document being able to unambiguously *assert* that it is
> *is* an XHTML document. Validating the veracity of that assertion is a
> separate subject.
> 
> >From a the point of view of a processor, you need to be able to
> unambiguously distinguish documents that are XHTML documents from
> documents that are not XHTML documents. Having found a document that
> asserts it is an XHTML document, you may then *choose* to validate it
> however you see fit, including using the XHTML-provided DTD
> declarations, some functionally-equivalent schema spec, or purpose-built
> code that happens to embody the rules of XHTML. But the validation is
> secondary (in the sense that useful processing can be done without
> validation).
> 
> My point is that a DOCTYPE declaration cannot serve as the *unambiguous*
> assertion of XHTMLness (although of course it can enable validation
> against the syntactic rules of XHTML). Stress on the word "unambiguous."
> We take the use of particular external subsets (or rather, the use of
> particular identifiers for external subsets) as the assertion of
> typeness, but that assertion has potential ambiguity for all the reasons
> I've stated. If something is ambiguous it cannot be relied on to drive
> computer processing and should therefore be avoided if at all possible.
> 
>                                          Perhaps there's some way to
> > state:
> > "if you have some type of validation that works *better* than DTDs at
> > validating against the XHTML document type, then go for it with our
> > blessing," but I can't think of any way to do that. If we relax the
> > requirement for DTD validation, then *anything* goes, and an unbounded
> > definition is no definition at all, not in this environment. We have a
> > very limited set of tools available to us.
> 
> Again, my point isn't about *validation*, it's about assertion of type
> membership.
> 
> > >
> > > That's my point: *you have no means* provided by XML 1.0.
> 
> The means I'm talking about is the means to assert type membership. This
> discussion started because it was proposed that *type membership* be
> indicated by disallowing the use of internal subsets and requiring the
> use of a particular external subset URI.  Those sets of restrictions
> *do* provide reliable type assertion...
> 
> ...but...
> 
> ...the reason that Arjun and I objected to them in principle is that A)
> it's a set of restrictions that XML 1.0 provides no way to state (and
> that therefore normal XML parsers cannot detect or enforce) B) it
> unnecessarily restricts authors' choice about how to manage the DOCTYPE
> declarations of XHTML documents and C) it propogates the mistaken idea
> that DOCTYPE declarations assert type membership.
> 
> It's not about validation--whether or not a document is validated is
> always the choice of the document receiver. It is inappropriate for a
> general-purpose standard to impose a validation policy on users of the
> standard. The standard must *enable* validation, but it cannot require
> it. So validation can't be the issue here.
> 
> The issue is: do document authors have a clear way to assert type
> membership of XHTML documents and do processors have a clear way to
> detect type membership and, if so, does that mechanism impose any
> unnecessary or inappropriate constraints?
> 
> My assertion is that the required use of external declaration subsets
> fails the last part of this test: it imposes unnecessary and
> inappropriate constraints on document authors.
> 
> > SGML and XML are too flexible to not allow loopholes that can be
> > deliberately abused. I don't expect to catch those kinds of errors
> > in all cases; validation is not a security system. If we assume that
> > authors are well-intentioned but ignorant or careless, then DTD
> > validation provides a pretty good measure of how the structure of
> > a document's markup matches the declared type. Yes, I understand
> > the limitations of this. But as a machine process it is the one
> > best shot we have.
> 
> But part of my argument about architectures is that they provide a type
> membership assertion mechanism that *cannot be subverted*. The owner of
> the architecture gets to define *a single name* by which the
> architecture is referenced and can provide a set of DTD declarations
> that cannot be modified by document authors. This means that processors
> can first detect an unambiguous type membership assertion (the
> architecture use declaration that uses the architecture name) and then,
> if desired, do normal XML syntactic validation of the document against
> the architectural DTD.  There is no possibility of subversion of this
> process by document authors in ways that cannot be easily detected (such
> as hacking the URI resolver that fetches the architectural DTD which
> should, ideally, exist in exactly one place).
> 
> > [regarding AF declarations....]
> > > But not to the beginning of the DTD, to the document, whether or not
> > > there's a DTD.
> >
> > But it would be *okay* to be in the DTD, correct?
> 
> Yes, it's fine for it to be in the DTD, but that can't be the *only
> place* it's allowed by the XHTML spec. I also need to be able to have
> documents with no DOCTYPE declaration that have architecture use PIs (or
> the equivalent element-based syntax ala name spaces, which I suppose we
> could ammend 10744 to provide if the W3C powers that be simply will not
> accept PIs). But remember, what's important is type membership,
> which can be done syntactically any number of ways, including through
> the normative use of a name-space declaration.

Well, I'm spending today and tomorrow here in Alexandria with Steve 
Newcomb and Michel Biezunski discussing the subject of "Topic Maps for
the Web", and between the AF issues there and yours and Arjun's patient
insistence, and reading over the relevant passages in ISO 15445, I will
be proposing some changes to the conformance section of XHTML in next 
week's HTML WG face-to-face meeting. I obviously can't guarantee anything
(being only a single member no matter how influential), but the  presence
of such language in a very-related ISO spec (and Murray's re-education 
over the past week) will hopefully prove fruitful. 

Thanks very much to both you and Arjun,

Murray

...........................................................................
Murray Altheim                                   <mailto:altheim@sonic.net>
Member of Technical Staff, Tools Development & Support
Sun Microsystems, Inc. MS MPK17-102
1601 Willow Rd., Menlo Park, California 94025  <mailto:altheim@eng.sun.com>

   the honey bee is sad and cross and wicked as a weasel
   and when she perches on you boss she leaves a little measle -- archy

Received on Thursday, 20 January 2000 17:45:28 UTC