Re: FPI Mythology (was: XHTML Considered Harmful)

On Fri, 29 Jun 2001, William F. Hammond wrote:

> My view, however, of the various HTML-under-SGML specs that begin
> with RFC1866, then the 3.2, 4.0, 4.01 versions, and now the
> several versions of HTML as XML is that an HTML document is an
> SGML application that must also meet other requirements.

I've heard this before, many a time, and it has always been special
pleading.  The "other requirements" bit is an evasion, a fig-leaf to
cover up the fact that not only is SGML being retrofitted, but also
the retrofitting does *not* work in general.

Where the basic procedure has been to scrounge about in ISO 8879 for
circumstantial specifics that could be made to "fit", the intent of
"other requirements" has always been the factitious ruling out of
things that don't "fit".  It's a scam.
 
> Formally, this means that an HTML document is something that
> always gives rise to an SGML application, but it is not correct to
> say that it "is" an SGML application.

So, you're saying that we shouldn't take the W3C specs seriously?  I
agree!:-)

> One can say that it "degrades to" an SGML application as a short
> form of indicating that for a given HTML document there is a
> canonically associated SGML application.

I see no benefit from such scholasticism - other than concocting
gravitas for a bunch of handwaving.  

> A validating agent must know how to perform the canonical association.

There is no canonical association worth the consideration.

But, while we're on the subject and prepared to make believe...
 
> It would be mischief to ship the completely assembled version of an
> HTML 2.0 document as an SGML application (SGML declaration, document
> type definition, and instance) under the purview of RFC1866 through
> HTTP.

No one has claimed that this would be necessary, much less advisable.

> But if an HTML document "is" an SGML application, that should be
> sensible.

No.  This is a false dilemma.  There is no One And Only One True Way
to assemble the parts of an SGML document, as presented to a generic 
SGML parser.  

> The example
> 
>             <title>A Test</title><body><p>&#338;</body>
> 
> can be validated against the 2.0 DTD if the SGML declaration for 4.01
> is used but not with the correct declaration.  

Yes.  The lesson is that character references, in general, should be
avoided.

> It would be wrong to use an FPI for 2.0 

Nope.  That's just a set of declarations.  You're still investing the
FPI itself with semantic significance.  This is simply *not* correct. 

> or to say that it is an HTML 2.0 document because char 338 is not
> in the character set for 2.0.

Yes, but that FPI has absolutely nothing to do with this normative
provision of the spec.

>>> and has specified a particular form of document type declaration
>>> construction using one of a small list of FPI's.
>> 
>> Actually, no.  They have done the right thing in publishing FPIs for
> 
> In RFC1866 it's not required, but for W3C/3.2 a doctype declaration
> is required and for each subsequent W3C version it's required.

This is an example of the root evasion at work.  What is allegedly
"required" is not a document type declaration per se (on grounds of
formal conformance) but one of a *particular* form.  This has always
been utterly bogus.

> It's relevant in the context of the web where one cannot ship
> anything more than the instance with a short prolog.

Rubbish.  A Processing Instruction would have sufficed.  Oops!  The
Mosaic spawn barf on PIs, sorry, can't have that!  More scrambling for
other retrofits and specious post-facto rationalizations to match...

> For example, in the 4.01 spec, section 7.1 says that a document
> must begin with a "line containing HTML version information".

So it does.  The spec is full of such handwaving.
 
>> In a nutshell, you're proposing that a validation system do nothing
>> until it has sniffed an FPI in a document type declaration,
> 
> The word "sniffing" is inappropriate.

It is descriptive, accurate, and by now covered by precedent.
 
> A doctype declaration is required for all but 2.0.  So formally if
> there is no doctype declaration and the HTML is assumed to be in
> the W3C family of document types, one should assume 2.0.

Actually, the 2.0 provision has to do with the eminently practical
distinction between documents and message entities.  The later specs
lost the plot (no surprise - pretenses tend to unravel over time.)

> The fact that one of them is also XML is not that important, and
> certainly not important enough to justify the claim of a user
> agent advocate that HTML 4.01 and XHTML 1.0 should live under
> different HTTP and SMTP content types.

What does an FPI have to do with this?


Arjun

Received on Saturday, 30 June 2001 02:14:46 UTC