PROPOSAL: An extension mechanism for HTML

Joe English (joe@trystero.art.com)
Sat, 30 Sep 1995 12:28:52 PDT


Message-Id: <9509301928.AA02652@trystero.art.com>
To: html-wg@oclc.org, www-html@w3.org
Subject: PROPOSAL: An extension mechanism for HTML
Date: Sat, 30 Sep 1995 12:28:52 PDT
From: Joe English <joe@trystero.art.com>




                    An Extension Mechanism for HTML
                              Version: 1.0
                               J. English
                           30 September 1995


                                ABSTRACT

    HTML currently lacks a well-defined mechanism for developing and
    deploying new features. This proposal addresses a small part of
    this problem at the SGML level by adding a general-purpose
    ``alternate representation'' element. Content providers may use
    this element to supply an alternate representation for browsers
    which can not present or do not understand extended HTML
    features.

    A new scheme for handling unrecognized elements in HTML user
    agents is defined, and a brief list of guidelines for designing
    HTML extensions is presented.

    Issues of media type parameters for extended versions of HTML
    and mechanisms for actually extending the HTML DTD are
    _expressly not considered or addressed_ in this proposal.


                          STATUS OF THIS MEMO

    This is a working draft, being circulated for comment only.

    If there is sufficient support for this proposal it will be
    submitted as an Internet-Draft. Please send comments and
    suggestions to the author <joe@art.com>, the <html-wg> mailing
    list, or the <www-html> mailing list.


                                CONTENTS

    1      Statement of the problem
    2      Proposed Solution
    3      Changes to DTD
    4      Impact on existing browsers and tools
    5      Impact on existing documents
    6      Deployment and interoperability
    7      Format negotiation
    8      Guidelines for extension elements
    9      Potential problems
    10     Acknowledgments and history
    A      Other solutions
    A.1    ALT  attribute instead of element
    A.2    ALTSRC  attribute
    A.3    NOxxx  elements
    A.4    Conditional Element
    A.5    Marked Sections
    A.6    No tags
    A.7    Omissible tags


1. STATEMENT OF THE PROBLEM

    _How do we teach current browsers to understand elements that
    haven't been invented yet?_

    The HTML document type definition is still far from complete.
    There are several widely deployed new features which are not
    represented in the HTML 2.0 DTD, several more which have been
    proposed, and there will no doubt be even more in the future.

    At the same time, there is a large installed base of HTML user
    agents which (by definition) do not support newly-invented HTML
    extensions. It is not feasible for developers or users to
    simultaneously update all software every time a new extension is
    developed.

    Therefore a mechanism or mechanisms for providing backward
    compatibility with the installed base is desperately needed.


2. PROPOSED SOLUTION

    A new, general-purpose ``alternate representation'' element is
    defined as follows:

    <!ELEMENT ALT - - (%body.content;)>

    That is, ALT may contain anything that is legal inside the BODY
    element, the start- and end-tags are required, and it has no
    attributes.

    The ALT element is not allowed in the content of any current
    HTML level 2 elements. Instead, it is intended to be used inside
    _new_ elements which are not part of the current standard.

    The ALT element contains an `alternate representation' of its
    parent element (no matter what that parent element is). The
    alternate representation should be presented if the user agent
    is not able to present the rest of the containing element. If
    the user agent is able to present the containing element, the
    content of the ALT element should be ignored.


3. CHANGES TO DTD

    This proposal entails no changes to the HTML 2.0 DTD, as it
    addresses HTML extensions only.

    In future extensions to HTML, any newly-defined elements which
    can appear as direct children of current level 2 elements
    (hereafter, `extension elements') may include the ALT element in
    their content model as an optional first subelement.

        Note: For the purpose of this proposal, new elements
        which appear only inside extension elements are not
        considered extension elements themselves.

    For example, the definition of the TABLE extension element would
    be changed from:

    <!ELEMENT table - - (caption?, col*, thead?, tbody+)>

    to:

    <!ELEMENT table - - (alt?, caption?, col*, thead?, tbody+)>

    Since TR, THEAD, and CAPTION are only allowed inside TABLE, they
    are not considered extension elements and need not include ALT
    in their content models.

    See below (8. "Guidelines for extension elements") for other
    guidelines in designing extensions.


4. IMPACT ON EXISTING BROWSERS AND TOOLS

    For cases where an extension element contains no other textual
    content (such as the proposed EMBED and FRAMESET elements), no
    change to existing browsers is required since the ``ignore
    unrecognized tags'' rule provides automatic backward
    compatibility. (In fact, for such cases there is no need to use
    a standardized name for the alternate representation element at
    all except possibly for uniformity.)

        (HTML 2.0 spec, 4.2.1 "Undeclared Markup Error Handling"
        [5])

        To facilitate experimentation and interoperability
        between implementations of various versions of HTML, the
        installed base of HTML user agents supports a superset
        of the HTML 2.0 language by reducing it to HTML 2.0:
        markup in the form of a start-tag or end-tag, whose
        generic identifier is not declared is mapped to nothing
        during tokenization. [...]

    To support other extensions such as TABLE which _do_ contain
    content that cannot be presented by user agents which do not
    understand the extension, this guideline shall be amended as
    follows:

        [...] When encountering markup in the form of a
        start-tag whose generic identifier is not recognized by
        the user agent, if it is immediately followed by an
        <ALT> start tag, then the content of the ALT element
        should be presented, and all content between the </ALT>
        end-tag and the end-tag of the unrecognized element
        should be discarded. If no ALT subelement is present,
        then the content of the unrecognized element is treated
        as if its start- and end-tags were not present.

    Note that under this proposal, browsers are expected to keep
    track of the element hierarchy instead of simply discarding
    unrecognized tags. Ideally this will be accomplished by
    employing a true SGML parser with an extended DTD supplied by
    the document provider. However, even heuristic parsers should be
    able to accomplish this.

    User agents may also present the alternate content for
    individual instances of _supported_ extension elements, at their
    discretion or the user's instructions. For example, in the case
    of EMBED, a user may have disabled object embedding, or a
    particular embedded object may be unavailable; the user agent
    may use the alternate representation in these cases as well.


5. IMPACT ON EXISTING DOCUMENTS

    This proposal does not impact existing documents, except
    possibly for those which are already using extended HTML
    features. The authors of such documents may wish to take
    advantage of the proposed ALT element if and when sufficient
    browser support has been deployed.


6. DEPLOYMENT AND INTEROPERABILITY

    The current proposal places a large part of the responsibility
    for backward compatibility on document providers. (Of course so
    does any scheme which requires multiple representations of an
    element to be provided. I feel that the current proposal does
    more to assist document providers in doing so than other
    schemes.)

    Use of this feature is entirely discretionary, much like the ALT
    attribute on IMG. It will not place any extra, mandatory, burden
    on authors who wish to use extended or experimental HTML
    features; however, should they choose to supply an alternate
    representation, it will make it easier to do so.

    The alternate representation can be nearly anything, including a
    preformatted plain text rendering of the primary content, a
    hyperlink to a bitmapped image, or the ever-popular ``click here
    to download a more advanced browser'' message.

    This proposal is also amenable to automatic processing. For
    example, a preprocessor could scan for TABLE elements which do
    not contain an author-supplied ALT representation and insert a
    plaintext rendering of the table.


7. FORMAT NEGOTIATION

    It has been suggested on numerous occasions that Web user agents
    advertise which HTML features they suport, and that servers
    provide a ``down-translated'' version of documents when
    necessary.

    At present, there is no clear definition of how this should work
    at the protocol level. There have been several proposals,
    notably Dan Connolly's paper ``Toward Graceful Deployment of
    Tables in HTML'' [1], but this has not been widely implemented.

        Note: Several Web sites are known to use the HTTP
        User-Agent header to determine which version of a
        document to send. This is a questionable practice, and
        is error-prone and hard to maintain.

    The current proposal has several advantages over format
    negotiation schemes:

    Format negotiation only works for HTTP and other transport
    protocols which support it. The current proposal will work for
    any transport protocol, including none (e.g., local file system
    access). No modifications to server software are necessary.

    Format negotiation does not provide any solution to the
    inherently complex problem of maintaining or generating multiple
    versions of a document. Including alternate representations in
    the document itself takes advantage of SGML to manage this
    complexity.

    The current proposal provides more flexibility than automatic
    down-translation based on format negotiation, since it allows
    authors to choose a suitable alternate representation for each
    element instance. It also gives more control to information
    consumers, who might have no indication that an alternate
    representation is even available if automatic format negotiation
    were in use.


8. GUIDELINES FOR EXTENSION ELEMENTS

    In order to support heuristic parsers, end-tag omission shall
    not be allowed for any extension element, nor shall any
    extension element have EMPTY declared content or content
    reference attributes.

        Note: Again, new elements which are only legal inside
        extension elements are not themselves extension
        elements, so this rule does not apply to them. In
        particular, the current Tables, Frames, and EMBED
        proposals all satisfy this requirement.

    Requiring end-tags on extension elements will allow heuristic
    parsers to ``re-synchronize'' the element hierarchy even in the
    presence of subelements without end-tags.

    It is not anticipated that all or even most extension elements
    will require an alternate representation. For example, the HTML
    3 / Netscape 2.0 BIG and SMALL tags can safely be ignored by
    browsers without losing information, so an alternate
    representation for these elements would not be necessary.

    To support ``on the fly'' formatting, an ALT element, if
    present, should be the first subelement of the element to which
    it applies.


9. POTENTIAL PROBLEMS

    The user community may be confused by the dual use of the name
    ALT as an element name and as an attribute name (on the IMG
    element) [7]. This is further exacerbated by the widespread (and
    incorrect) practice of referring to all syntactic constructs as
    ``tags'' instead of distinguishing between element names,
    attribute names, markup declarations, delimiters, and actual
    tags.

    If this is felt to be a serious problem, ALT could be renamed to
    ALTERNATE or something else.

    [[ See also [8]; I believe this has been addressed, by requiring
    user agents to keep track of the element hierarchy instead of
    discarding tags. ]]


10. ACKNOWLEDGMENTS AND HISTORY

    The idea of including an alternate representation in the
    document was first introduced with the ALT attribute on the IMG
    element. This was further refined in HTML 3 with the FIG
    element, which directly contains its alternate representation.
    The proposed FRAMESET and EMBED extensions took this a step
    further, by introducing explicit container elements for this
    purpose. The current proposal simply generalizes and formalizes
    this basic idea.

    Discussion on the html-wg mailing list has provided invaluable
    input exploring all the issues involved.


A. OTHER SOLUTIONS

    A number of other approaches to this problem have been
    suggested.

    [[ This section is a bit of a mess right now... -JE ]]


A.1. ALT ATTRIBUTE INSTEAD OF ELEMENT

    It has been suggested that the alternate representation might
    appear on an attribute, as it is with IMG [9].

    Due to the severe limitations of this approach, this is not
    advisable [10].


A.2. ALTSRC ATTRIBUTE

    Another approach is to supply the URI of a document containing
    an alternate representation on an attribute of extension
    elements. The attribute would have a standardized name, say
    ALTSRC. For example:

    <!-- in the DTD -->
    <!ATTLIST TABLE  ...
            ALTSRC  %URL;   #IMPLIED
            ...>
    <!-- in the document instance -->
    <TABLE altsrc="table1.txt">
    <CAPTION> Table 1 </CAPTION> ...  </TABLE>

    where table1.txt contains a preformatted, plain text rendering
    of the table.

    Under this scheme user agents would check for an ALTSRC
    attribute on start-tags with an unrecognized element name
    instead of completely ignoring them. If such an attribute is
    found, the user agent would discard the content of the
    unrecognized element and display the referenced URI either
    inline or as a hyperlink.

    This has the advantage of only transmitting the alternate
    representation if it is actually needed, saving transmission
    time. It would also help keep source documents less
    ``cluttered,'' since it would not be necessary to duplicate
    information in the main document.

        Note: This solution could be used in addition to the
        current proposal; the two are mutually compatible.


A.3. NOXXX ELEMENTS

    Another approach is to define a new alternate representation
    element for each new feature (e.g., NOFRAMES [2] and NOEMBED
    [3]), instead of using a standardized element name.

    This works when the extension element has no other textual
    content (as is the case with FRAMESET and EMBED), but not for
    extension elements with primary content.

    For example, if a user agent does not know about the TABLE
    element, it will not know that a (hypothetical) NOTABLES element
    contains an alternate representation either, and would still
    attempt to display the TABLE content under the ``ignore
    unrecognized tags'' rule.

        Note: A naming convention for generic identifiers -- for
        example, assuming that an unrecognized element name
        NOxxx is an alternate representation of a new xxx
        element -- is dangerous and ill-advised.


A.4. CONDITIONAL ELEMENT

    It has been suggested that the ALT element take a FEATURE
    attribute, which would be used to determine whether or not the
    ALT content should be displayed. Under this scheme, the ALT
    element may appear before instead of inside the extended
    element.

    [[ Citation? ]]

    A similar proposal calls for an OPTION element, with

    <!AttList Option
            PRESENT NAMES #IMPLIED
            ABSENT NAMES #IMPLIED
    >

    PRESENT and ABSENT would be a list of ``feature keywords''; the
    content should only be displayed if the feature is supported or
    unsupported, respectively. [7]

    Both of these schemes work on a per-feature basis instead of a
    per-element instance basis, so they are more coarse-grained and
    hence less flexible than the current proposal. I feel they are
    also more error-prone and less intuitive.

    The current proposal uses containment to express the
    relationship between an element and its alternate
    representation. In a conditional inclusion scheme, this
    information is lost.


A.5. MARKED SECTIONS

    Another suggestion is to ``modularize'' the DTD, and include
    parameter entities for each module. These would be defined by
    the user agent to either INCLUDE or IGNORE, depending on whether
    or not the module is supported, and authors could use them as
    status keywords in marked section declarations [7]:

    <![ %present.embed; [
    <embed stuff here>
    ]]>
    <![ %absent.embed; [
    if you see this, your browser does not support HTML level 23 version 29.
    ]]>

    This would require browsers to support marked sections (which
    they ought to anyway), and a much greater familiarity with SGML
    (also not a bad idea).

    On the down side, it requires a greater implementation effort
    and, like the conditional element scheme, obscures the
    relationship between the primary and alternate representations
    of an element. It is also likely to be confusing to the user
    community.


A.6. NO TAGS

    In the HTML 3 draft, the FIG element's _content_ was the
    alternate representation.

    It has also been suggested that EMBED work this way:

        (<199509250245.WAA29529@panix2.panix.com>)

        There is no need for redundant NOEMBED tags. Each EMBED
        is an implied choice between fetching the URL in
        question or rendering the enclosed content.

    [[ Full citation? ]]

    I find this less intuitive than supplying explicit start- and
    end-tags for the alternate content. Also, it does not allow
    extension elements to contain primary (non-alternate) content;
    this could be detrimental to future enhancements. (For example,
    EMBED may eventually include subelements to be used as
    parameters for processing the embedded object.)


A.7. OMISSIBLE TAGS

    The start- and end-tags for ALT could be made omissible:

    <!ELEMENT ALT O O (%body.content;)>

    This would allow current HTML 3 documents which use FIG to
    remain valid without being updated.

    Omitting the ALT start- and end-tags would defeat heuristic
    parsers in some cases, so providers would need to take care to
    include them where they might be necessary. This would apply
    only to extension elements which have textual primary content;
    current uses of FIG would still work.


                               REFERENCES

    [[ Fill this in... Tables draft, Netscapes FRAMES and EMBED
    proposals, FIG discussions. ]]

[1] Toward Graceful Deployment of Tables in HTML
    (<URL:http://www.w3.org/pub/WWW/MarkUp/table-deployment.html>)

    Dan Connolly <connolly@beach.w3.org>, 13-Mar-1995

[2] A Proposed Extension to HTML: Frames
    (<305E5CF5.45AE@netscape.com>)

    Eric Bina <ebina@netscape.com>, 17-Sep-1995

[3] The REAL proposal for addition to HTML 3.0: EMBED
    (<305F9E53.712E@netscape.com>)

    Alex Edelstein <alexed@netscape.com>, John Giannandrea,
    19-Sep-1995

[4] HTML3 Tables
    (<URL:http://www.w3.org/pub/WWW/TR/WD-tables-950925.html>)

    Dave Raggett <dsr@w3.org>, 25-Sep-1995

[5] HTML 2.0
    
    (<URL:ftp://ds.internic.net/internet-drafts/draft-ietf-html-spec-06.txt>)

    Dan Connolly and Tim Berners-Lee.

[6] HTML-WG Mailing List Archives
    (<URL:http://www.acl.lanl.gov/HTML_WG/>)

    HyperMail archive of the HTML Working Group mailing list.

[7] html-wg-95q3: Re: A proposal for addition to HTML 3.0: EMBED
    
    (<URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1167.html>)

    Liam Quin, <9509220353.AA25633@sqrex.sq.com>.

[8] html-wg-95q3: Re: A proposal for addition to HTML 3.0: EMBED
    
    (<URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1166.html>)

    Alexei Kosut,
    <Pine.HPP.3.91.950921201643.2548A-100000@ace.nueva.pvt.k12.ca.us>

[9] html-wg-95q3: ALTs for EMBED, etc
    
    (<URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1177.html>)

    Terry Allen, <9509220740.ZM4827@dmg.west.ora.com>

[10] html-wg-95q3: ALTs for EMBED, etc
    
    (<URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1178.html>)

    Mike Meyer, <19950922.78180D8.9477@contessa.phone.net>