Re: Why DOCTYPE Declarations for XHTML? from W. Eliot Kimber on 2000-01-19 (www-html@w3.org from January 2000)

From: W. Eliot Kimber <eliot@isogen.com>
Date: Wed, 19 Jan 2000 10:22:03 -0500 (EST)
To: www-html@w3.org
Message-id: <3885039C.52E9D021@isogen.com>
Murray Altheim wrote:
> 
> "W. Eliot Kimber" wrote:
> > Murray Altheim wrote:
> [...]
> > I'm not objecting to enabling validation. I'm only objecting to the
> > requirement that conforming XHTML documents must use a particular form
> > of DOCTYPE declaration (or even have a doctype declaration at all).
> 
> If we eliminate the requirement on DTD validation, we not only allow
> for other types of validation, we loosen the conformance requirements
> to the point where it becomes *less* possible to ascertain that a
> given instance is XHTML, not more. 

I think this is the disconnect: it's not a question of *determining*
whether or not a given document conforms to the XHTML spec, it's a
question of a document being able to unambiguously *assert* that it is
*is* an XHTML document. Validating the veracity of that assertion is a 
separate subject.

>From a the point of view of a processor, you need to be able to
unambiguously distinguish documents that are XHTML documents from
documents that are not XHTML documents. Having found a document that
asserts it is an XHTML document, you may then *choose* to validate it
however you see fit, including using the XHTML-provided DTD
declarations, some functionally-equivalent schema spec, or purpose-built
code that happens to embody the rules of XHTML. But the validation is
secondary (in the sense that useful processing can be done without 
validation).

My point is that a DOCTYPE declaration cannot serve as the *unambiguous*
assertion of XHTMLness (although of course it can enable validation
against the syntactic rules of XHTML). Stress on the word "unambiguous."
We take the use of particular external subsets (or rather, the use of
particular identifiers for external subsets) as the assertion of
typeness, but that assertion has potential ambiguity for all the reasons
I've stated. If something is ambiguous it cannot be relied on to drive
computer processing and should therefore be avoided if at all possible.

                                         Perhaps there's some way to
> state:
> "if you have some type of validation that works *better* than DTDs at
> validating against the XHTML document type, then go for it with our
> blessing," but I can't think of any way to do that. If we relax the
> requirement for DTD validation, then *anything* goes, and an unbounded
> definition is no definition at all, not in this environment. We have a
> very limited set of tools available to us.

Again, my point isn't about *validation*, it's about assertion of type
membership.

> >
> > That's my point: *you have no means* provided by XML 1.0.

The means I'm talking about is the means to assert type membership. This
discussion started because it was proposed that *type membership* be
indicated by disallowing the use of internal subsets and requiring the
use of a particular external subset URI.  Those sets of restrictions
*do* provide reliable type assertion...

...but...

...the reason that Arjun and I objected to them in principle is that A)
it's a set of restrictions that XML 1.0 provides no way to state (and
that therefore normal XML parsers cannot detect or enforce) B) it
unnecessarily restricts authors' choice about how to manage the DOCTYPE
declarations of XHTML documents and C) it propogates the mistaken idea
that DOCTYPE declarations assert type membership.

It's not about validation--whether or not a document is validated is
always the choice of the document receiver. It is inappropriate for a
general-purpose standard to impose a validation policy on users of the
standard. The standard must *enable* validation, but it cannot require
it. So validation can't be the issue here.

The issue is: do document authors have a clear way to assert type
membership of XHTML documents and do processors have a clear way to
detect type membership and, if so, does that mechanism impose any
unnecessary or inappropriate constraints?  

My assertion is that the required use of external declaration subsets
fails the last part of this test: it imposes unnecessary and
inappropriate constraints on document authors.
 
> SGML and XML are too flexible to not allow loopholes that can be
> deliberately abused. I don't expect to catch those kinds of errors
> in all cases; validation is not a security system. If we assume that
> authors are well-intentioned but ignorant or careless, then DTD
> validation provides a pretty good measure of how the structure of
> a document's markup matches the declared type. Yes, I understand
> the limitations of this. But as a machine process it is the one
> best shot we have.

But part of my argument about architectures is that they provide a type
membership assertion mechanism that *cannot be subverted*. The owner of
the architecture gets to define *a single name* by which the
architecture is referenced and can provide a set of DTD declarations
that cannot be modified by document authors. This means that processors
can first detect an unambiguous type membership assertion (the
architecture use declaration that uses the architecture name) and then,
if desired, do normal XML syntactic validation of the document against
the architectural DTD.  There is no possibility of subversion of this
process by document authors in ways that cannot be easily detected (such
as hacking the URI resolver that fetches the architectural DTD which
should, ideally, exist in exactly one place). 

> [regarding AF declarations....]
> > But not to the beginning of the DTD, to the document, whether or not
> > there's a DTD.
> 
> But it would be *okay* to be in the DTD, correct? 

Yes, it's fine for it to be in the DTD, but that can't be the *only
place* it's allowed by the XHTML spec. I also need to be able to have
documents with no DOCTYPE declaration that have architecture use PIs (or
the equivalent element-based syntax ala name spaces, which I suppose we
could ammend 10744 to provide if the W3C powers that be simply will not
accept PIs). But remember, what's important is type membership,
which can be done syntactically any number of ways, including through
the normative use of a name-space declaration.

                                                     Is there any
problem
> with the declarations occurring twice (ie., if I was able to lobby
> for default inclusion in the XHTML 1.1 DTD but somebody were to also
> include it via some other method?). 

No, no problem as far as ISO/IEC 10744 is concerned.

                                     If you have something to propose
> in this regard, send it into the W3C and perhaps they can standardize
> a method for XML documents. Or send it into OASIS. (!)

It's worth thinking about.

> > > Since the beginning of XHTML m12n there's been an empty XHTML
> > > module named "XHTML 1.1 Base Architecture" whose content looks
> > > like this:
> >
> > This is cool. Keep it in.
> 
> Yes, but it's incomplete. Undone. Needs work. Space for rent. Help wanted.

I'll see if I can help here, but I've got a lot of standards work
already on the stack...

Cheers,

E.
Received on Thursday, 20 January 2000 05:28:41 UTC