Re: FPI Mythology (was: XHTML Considered Harmful)

On Thu, 28 Jun 2001, William F. Hammond wrote:

> My question could have been more specifically worded in
> stipulating the form of document type declaration construction
> permitted in HTML documents.

A *conforming* SGML Application cannot constrain the form of document
type declarations.  Please review this discussion of Clause 15.2.2:
Remove the claim to conformance (and the normative reference to ISO
8879), and then the HTML specs are free to ascribe any homegrown
interpretations they like to constructs nominally borrowed from SGML.  
Tag Soup can do exactly the same thing, btw - and in fact, just as
some political constitutions can be unwritten, this is precisely what
Tag Soup has done in practice anyway:)

[The problem, of course, is that the HTML specs *need* the normative
reference to ISO 8879, to have a claim to intellectual legitimacy at
all.  The irrelevance in practice is one reflection of the phoniness
of "but we don't really mean it, don't hold us to it!"]
> If a late version of HTML has a larger charset than an early version,
> then it is formally wrong to allow the larger charset in something
> specified as the early version.

I don't see how, if the newer set were a proper superset of the older
one.  (The whole character set business has been handled less than
optimally, IMHO, but that's a separate discussion.)

> Each of the IETF/W3C specifications of HTML beginning with version
> 2.0 (RFC 1866) has specified a particular SGML declaration

Yes, FWIW.  Each could have done better.

> and has specified a particular form of document type declaration
> construction using one of a small list of FPI's.

Actually, no.  They have done the right thing in publishing FPIs for
"official" declaration subsets, such as they are.  The business about
particular forms of document type declarations is a shibboleth, a myth
of convenience.  The core validation requirement is that an instance
validate with respect to a (specific) declaration subset.  To this
end, the particular form of a document type declaration - or even, in
fact, its presence - is irrelevant. 

> Internal declaration subsets are not allowed, 

For *conforming* SGML applications, this is unmitigated nonsense.

> and system identifiers are not allowed.
This is a red herring.  System identifiers are "local" by definition
(Note that wherever system identifiers can occur, there is always an
application-defined name also, which can be used directly in a lookup
mechanism - this is why the syntax of external identifiers allows the
keyword SYSTEM without an associated literal.  There is an extensive
discussion of this in p.378-9 of the SGML Handbook, accompanying the
formal prose for 10.1.6 "External Identifier".)  A public spec need
not bother with system identifiers at all, and indeed should not.

The practice of sticking URIs in system identifiers - apparently, for
lack of a better place to put them - is wrong.  The proper places for
URIs are either public identifiers (when intended as "names" - cf.
"Cool URIs don't change") or in catalog entries (when intended as
storage identifiers.)

> With these conditions the unified validation scheme that I
> described is fine.

I disagree.  In a nutshell, you're proposing that a validation system
do nothing until it has sniffed an FPI in a document type declaration,
at which point it should use the FPI to resolve all other requirements
such as appropriate SGML declarations and the like.  The main purpose
of such guesswork, apparently, is to *hide* from ordinary people the
fact that XHTML and HTML4 documents will *not* validate in "identical
regimes".  In the long run, by encouraging voodoo (and an unthinking
formulaic approach to validation) this is a disservice to the public. 

It so happens that the front end Perl programs wrapping SP/nsgmls in
the various online services engage in precisely these heuristics, but
this is more a consequence of incomplete specification - and lack of
support in the older versions of SP for WebSGML TC provisions - than
of anything else.


Received on Thursday, 28 June 2001 23:51:15 UTC