Re: HTML 4.0/SGML, (#PCDATA)* problem in TEXTAREA

Arnoud (galactus@htmlhelp.com)
Thu, 31 Jul 1997 10:07:26 +0200


From: galactus@htmlhelp.com (Arnoud "Galactus" Engelfriet)
To: www-html@w3.org
Subject: Re: HTML 4.0/SGML, (#PCDATA)* problem in TEXTAREA
Date: Thu, 31 Jul 1997 10:07:26 +0200
Message-ID: <+eE4z4uYOBZa089yn@htmlhelp.com>

In article <199707302141.PAA00460@underworld.bigpic.com>,
"Neil St.Laurent" <neil@bigpic.com> wrote:
> In the time of 30 Jul 97:22:09, www-html@w3.org pronounced:
> > Section 4.2.1 of RFC 1866 (HTML 2.0 spec) recommends that start-
> > and end-tags for unknown elements should be "mapped to nothing"
> > (standardese for "ignored") during tokenization..
> 
> I'm wondering how this is even recommended since this would violate 
> SGML.  There is no recovery technique that would actually allow me to 
> tokenize an invalid tag.

Well, you need to have *some* way to handle extensions and future
HTML elements. It is unfortunately not realistic to ask that all
authors write documents that completely conform to a specific
standard, or that browsers ask for specific versions and that servers
filter out the newer elements on the fly.

Perhaps I'm being too simple here, but if I'm tokenizing a document
with an SGML parser, and I encounter the element FOOBAR that I've
never heard of, can't the code that interfaces with the parser simply
ignore the parser's error message and pretend it never saw that element?
The parser is happy because it reported its error, and the handler
ignores it, so everything is ok, right?

-- 
E-mail: galactus@htmlhelp.com .................... PGP Key: 512/63B0E665
Maintainer of WDG's HTML reference: <http://www.htmlhelp.com/reference/>