Re: Why DOCTYPE Declarations for XHTML?

"W. Eliot Kimber" wrote:
> Murray Altheim wrote:
[...] 
> I'm not objecting to enabling validation. I'm only objecting to the
> requirement that conforming XHTML documents must use a particular form
> of DOCTYPE declaration (or even have a doctype declaration at all).

If we eliminate the requirement on DTD validation, we not only allow
for other types of validation, we loosen the conformance requirements 
to the point where it becomes *less* possible to ascertain that a
given instance is XHTML, not more. Perhaps there's some way to state:
"if you have some type of validation that works *better* than DTDs at 
validating against the XHTML document type, then go for it with our
blessing," but I can't think of any way to do that. If we relax the
requirement for DTD validation, then *anything* goes, and an unbounded
definition is no definition at all, not in this environment. We have a
very limited set of tools available to us.
 
> That is, the issue of being able to do validation and the issue of
> knowing without ambiguity whether or not a given document claims to be
> an XHTML document are two separate issues.

Understood. We're simply tackling the part that the toolkit we've been
given provides us. It doesn't keep someone from sticking an entire
document in a <p>, but then, if you look you'll see very little by
way of definitions of document structure in any HTML specification,
and you will find almost *none* in DocBook. They only define the
structures available in the type, not the type itself. Ie., the 
definition is a container or series of containers, not a structure in
itself. There's nothing in HTML 4 that says it's *wrong* to put a 
whole document in a <p>, nothing in DocBook that says you can't 
use 50 <Sect4>'s as list items, with a <Sect3> as a list container.

So in the sense that XHTML does not prescribe a strict document 
structure, validating that the markup structures makes sense is 
pretty close to the limit of what can be done. That a <li> is in a
<ul> or a <ol>. Not great, not something to write home to mom about,
but it's what we have.
 
> That is, a document claims to conform to a type. You then have the
> option of validating the document against the rules of that type. Using
> a set of DTD declarations is part of that validation (but not all of
> it).

Yes, I do understand (completely) the limitations of restrictions that
only affect a portion of the type definition. And I agree that Sun has
much more restricted (and higher quality) authoring by trained writers
using very high quality tools and a lot of support. It's one of the
reasons I like it here. And yes, you're correct, this is very much due
to people like Jon Bosak, Mike Rogers, Bill Smith, Eduardo Gutentag and
others, all champions of SGML and XML at Sun.

[...]
> > > You do have an alternative: a namespace use declaration with a
> > > meaning defined by the XHTML spec.
> >
> > And that would make XHTML different from every other XML markup language.
> 
> No, I think it would make it consistent with many markup languages. The
> only difference between a namespace declaration and an architecture use
> declaration is the explicit ability to bind an architecture to a set of
> architectural DTD declarations.
> 
> I also point out that architectural processing *is currently
> implemented* through SAXARCH, so it's not like there's no tools support
> for architectures, just that it's not yet in IE5 (as far as I know).

Yes, but as you know, the acceptance (or understanding) of such technology
is highly limited within the W3C. You guys speak a foreign language 
understood by few people. SAX and SAXARCH are not standardized APIs even 
if the former is de facto part of most parsers. If a highly simplified
AF API were proposed to the W3C and we had it in our toolkit, this would be
an entirely different situation. But we don't, and to me this is all just
speculation until we do. If you want to propose such a feature (eg., a
standardized PI to attach to documents as a link to an AF declaration, so
that every author doesn't have to create their own), then I'd be happy
to work with you to champion its acceptance within the W3C. But you 
probably know as well as I this is unlikely to be accepted.

> > > Just because XML provides the optional feature of DOCTYPE declarations
> > > doesn't mean that XHTML is obligated to require their use or impart any
> > > special meaning to their use when there are other was to get what you
> > > want which are reliable.
> >
> > Such as? Supported by what tools? Something that given a cold day in
> > hell would be accepted by the W3C? If we don't use validation via DTD
> > we have no acceptable means to establish a document type at all.
> 
> That's my point: *you have no means* provided by XML 1.0.

No, we *do* have DTDs. That's why we're using them. They don't provide
the type validation you're talking about, but they do guarantee that
the markup in the document conforms to the DTD. 

> Therefore you
> (and all other standardized XML applications) must do something else.
> The only question is what? XHTML is, by it's nature, a groundbreaking
> application (just as HTML was).
> 
> > Perhaps you can clarify this for me: I have thousands of valid SGML
> > documents that conform to document type definitions, using DOCTYPE
> > declarations.
> 
> No, you have thousands of documents that conform to document type
> *declarations* that may or may not conform to document type
> *definitions*. The conformance to the latter *is not in any way*
> indicated by conformance to the former. Likewise, failure to conform to
> the former does not necessarily indicate failure to conform to the
> latter.

I beg to disagree. You are correct that the possibility to abuse
markup exists. When I go into a hospital, there are all sorts of
sharp objects and drugs that could kill me. But the medical staff
uses those tools with care and educated intention. If I thought 
they were deliberately going to kill me, I'd never enter a 
hospital. The question revolves around intention and negligence, 
not malfeasance.

SGML and XML are too flexible to not allow loopholes that can be
deliberately abused. I don't expect to catch those kinds of errors
in all cases; validation is not a security system. If we assume that
authors are well-intentioned but ignorant or careless, then DTD 
validation provides a pretty good measure of how the structure of
a document's markup matches the declared type. Yes, I understand
the limitations of this. But as a machine process it is the one
best shot we have. 

I can't guarantee that a paragraph contains one idea, but at least 
I can be sure a paragraph element isn't inside the head of a document,
or that other paragraphs don't occur in a paragraph, or that I didn't
misspell a tag name. This is the kind of validation that is to many 
people valuable. I wouldn't chop off my arm simply because it isn't 
strong enough to lift something heavy. If in my environment I'm concerned
about authors abusing the DTDs, I'll make sure the tools don't allow
modification of them, either directly (chmod) or via tool validation
that prohibits any modification of the prolog, or only general entities,
etc. This becomes more a management issue than a technical one.

[regarding AF declarations....]
> But not to the beginning of the DTD, to the document, whether or not
> there's a DTD.

But it would be *okay* to be in the DTD, correct? The module could
also then be used as an entity in its own right, and *if* some 
mechanism for declaring it at the beginning of a document available,
we could provide information on how to do it. Is there any problem
with the declarations occurring twice (ie., if I was able to lobby
for default inclusion in the XHTML 1.1 DTD but somebody were to also
include it via some other method?). If you have something to propose
in this regard, send it into the W3C and perhaps they can standardize
a method for XML documents. Or send it into OASIS. (!)
 
> > Since the beginning of XHTML m12n there's been an empty XHTML 
> > module named "XHTML 1.1 Base Architecture" whose content looks 
> > like this:
> 
> This is cool. Keep it in.

Yes, but it's incomplete. Undone. Needs work. Space for rent. Help wanted.
 
Murray

...........................................................................
Murray Altheim, SGML Grease Monkey         <mailto:altheim&#64;eng.sun.com>
Member of Technical Staff, Tools Development & Support
Sun Microsystems, 901 San Antonio Rd., UMPK17-102, Palo Alto, CA 94303-4900

   the honey bee is sad and cross and wicked as a weasel
   and when she perches on you boss she leaves a little measle -- archy

Received on Tuesday, 18 January 2000 18:17:01 UTC