SGML SHORTTAGS Feature usage in HTML 4.x Recommendation.

Mssrs.,

in attempting to modify the W3C MarkUp Validator to more reliably detect
and report more forms of erroneous and invalid HTML, it's been brought and
has come to my attention that the SGML Declaration included with the HTML
4.01 Recommendation appears to be at odds with both the prose of that
Recommendation and the majority of User Agent implementations.

The specific area of concern is from the FEATURES section; specifically the
SHORTTAGS feature in the MINIMIZE section. The current SGML Declaration
reads in part:

....
  FEATURES
    MINIMIZE
      SHORTTAGS YES
....

This allows many things that are not sanctioned by the prose of the HTML
4.01 Recommendation, are not implemented by any User Agent I am aware of,
and appears to contrary to the intent of the design of HTML (though this is
obviously mere guesswork).

For intance, "SHORTTAGS YES" allows empty start ("<>") and end ("</>")
tags, unclosed start ("<gi") and end ("</gi>") tags, and NET-enabling tags
("<gi/CDATA/").

I assume that this is due to the publishing schedule and implementation
rate of the so-called "WebSGML Adaptations Annex" (Annex K) to the ISO SGML
Standard in relation to the design and publishing of the HTML
Recommendation.


May I suggest you issue an erratum for the HTML 4.01 Recommendation noting
that the included SGML Declaration is for compatibility concerns with
common SGML systems (that no longer exist today) and that the more precise
SGML Declaration would contain a FEATURES section such as:

....
  FEATURES
    MINIMIZE
      SHORTTAG
        STARTTAG
          EMPTY    NO  -- outlaws "<>" -- 
          UNCLOSED NO  -- outlaws "<foo" --
          NETENABL NO  -- outlaws "<p/text<em/more text/ nested/" --
        ENDTAG
          EMPTY    NO  -- outlaws "</>" -- 
          UNCLOSED NO  -- outlaws "</foo" --
        ATTRIB
          DEFAULT  YES -- allows defaulted attributes --
          OMITNAME YES -- allows "<gi attr>" --
          VALUE    YES -- allows unquoted attrs; "<gi att=val>" --
....

Other parts of the SGML Declaration might also benefit from a review in
light of intent, implementations, and current practice in SGML; but these
are the issues we register that authors struggle with most often
(sufficiently so that several of these have become "FAQs" for the Validator
Service). These are also issues that we _cannot_ detect and inform authors
of, unless the SGML Declaration (or an errata to same) makes use of this
more fine-grained form from Annex K.


Kind Regards,
Terje Bless

Received on Monday, 11 November 2002 18:33:29 UTC