- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Thu, 09 Jan 97 19:56:32 CST
- To: W3C SGML Working Group <w3c-sgml-wg@www10.w3.org>
On Thu, 9 Jan 1997 16:55:29 -0500 Peter Flynn said: >David Durand wrote: > > 2. DTDs will frequently be unavailable > >Having come into class halfway through, so to speak, I'm still not >clear why this is going to be so. Or does it simply mean that >hand-tagging authors and those using non-SGML/XML tools will just be >creating tag soup from half-rememberd bits of HTML? I think there are several reasons to expect DTDs might be unavailable for some documents: - the document is tag soup using half-remembered HTML (your scenario) - the document was produced during an extended exploratory document analysis (it reflects a tentative understanding of one instance of a document type; when enough instances are tagged, a DTD will be produced and the documents retagged; in the meantime, they're on the Net for various purposes) - the document is of a type with a wholly invariant structure, for which no one has ever bothered to write a DTD since they think it's overkill - the document is of a type with a highly variable structure, and which has a historical existence of its own, so the concept of validation hardly applies in the usual sense: if the owner did write a DTD, it would be the DTD that was required to conform to the document, not vice versa; the owner has not yet been persuaded that there is any point in having a DTD in this case. - the document has a DTD but the instance does not point to it because of (a) user error, (b) software error, or (c) someone involved didn't think pointing to the DTD would be useful, having heard erroneous reports that XML software never reads DTDs anyway ... - the document has a DTD, and it points to it, but the server failed after serving the document and before finishing serving the DTD - the document has a DTD, and it points to it, but the XML processor is constructed to skip over the DTD; strictly speaking, this is not a case of the DTD being inaccessible, but is certainly one of the information in the DTD being inaccessible. How frequent any of these scenarios will be no one knows, though I am confident many people have hunches. But it's worth while expecting DTDs to be unavailable, or acting as if we did, because the design goals of the effort involve making it possible for non-validating applications to do their work without reading the DTD. Even if we thought that DTDless docs on the web were likely to be rare, our design goals require us to ensure that when they do occur they won't be disadvantaged. I don't expect them to be rare, and the scenarios I outline are intended to persuade you that the idea is not in itself reprehensible. One reason I agree with the design goal in question is that I know of lots of useful things software can do with marked up documents which have nothing to do with validation, and (in general) without even glancing at the DTD. Another is that fetching the DTD -- particularly if it's in five or ten or twenty files, involves a network performance hit that most information providers would like to avoid. (I did fetch a TEI-encoded document from Peter's project once; fetching the TEI DTD across from Ireland occupied the machine for ten minutes. Particularly ironic, of course, given that the entire thing was already available locally, including Peter's local modification files ...) I hope this helps explain, for latecomers, the fervor with which the plight of the DTD-less processing program is considered in these discussions. Usual disclaimers apply. -C. M. Sperberg-McQueen
Received on Thursday, 9 January 1997 21:29:11 UTC