Re: Permitting non-indirect links from Michael Sperberg-McQueen on 1997-01-10 (w3c-sgml-wg@w3.org from January 1997)

From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
Date: Thu, 09 Jan 97 19:56:32 CST
To: W3C SGML Working Group <w3c-sgml-wg@www10.w3.org>
Message-Id: <199701100229.VAA00935@www10.w3.org>
On Thu, 9 Jan 1997 16:55:29 -0500 Peter Flynn said:
>David Durand wrote:
>
>      2. DTDs will frequently be unavailable
>
>Having come into class halfway through, so to speak, I'm still not
>clear why this is going to be so. Or does it simply mean that
>hand-tagging authors and those using non-SGML/XML tools will just be
>creating tag soup from half-rememberd bits of HTML?

I think there are several reasons to expect DTDs might be unavailable
for some documents:

  - the document is tag soup using half-remembered HTML (your scenario)
  - the document was produced during an extended exploratory document
    analysis (it reflects a tentative understanding of one instance of
    a document type; when enough instances are tagged, a DTD will be
    produced and the documents retagged; in the meantime, they're on the
    Net for various purposes)
  - the document is of a type with a wholly invariant structure, for
    which no one has ever bothered to write a DTD since they think it's
    overkill
  - the document is of a type with a highly variable structure, and
    which has a historical existence of its own, so the concept of
    validation hardly applies in the usual sense:  if the owner did
    write a DTD, it would be the DTD that was required to conform to the
    document, not vice versa; the owner has not yet been persuaded
    that there is any point in having a DTD in this case.
  - the document has a DTD but the instance does not point to it
    because of (a) user error, (b) software error, or (c) someone
    involved didn't think pointing to the DTD would be useful, having
    heard erroneous reports that XML software never reads DTDs
    anyway ...
  - the document has a DTD, and it points to it, but the server failed
    after serving the document and before finishing serving the DTD
  - the document has a DTD, and it points to it, but the XML processor
    is constructed to skip over the DTD; strictly speaking, this is
    not a case of the DTD being inaccessible, but is certainly one of
    the information in the DTD being inaccessible.

How frequent any of these scenarios will be no one knows, though I am
confident many people have hunches.  But it's worth while expecting DTDs
to be unavailable, or acting as if we did, because the design goals of
the effort involve making it possible for non-validating applications to
do their work without reading the DTD.  Even if we thought that DTDless
docs on the web were likely to be rare, our design goals require us to
ensure that when they do occur they won't be disadvantaged.  I don't
expect them to be rare, and the scenarios I outline are intended to
persuade you that the idea is not in itself reprehensible.

One reason I agree with the design goal in question is that I know of
lots of useful things software can do with marked up documents which
have nothing to do with validation, and (in general) without even
glancing at the DTD.  Another is that fetching the DTD -- particularly
if it's in five or ten or twenty files, involves a network performance
hit that most information providers would like to avoid.  (I did fetch a
TEI-encoded document from Peter's project once; fetching the TEI DTD
across from Ireland occupied the machine for ten minutes.  Particularly
ironic, of course, given that the entire thing was already available
locally, including Peter's local modification files ...)

I hope this helps explain, for latecomers, the fervor with which the
plight of the DTD-less processing program is considered in these
discussions.

Usual disclaimers apply.

-C. M. Sperberg-McQueen
Received on Thursday, 9 January 1997 21:29:11 UTC