SGML and HTML

Hi,

Recently I realized that some constructs I considered a syntax error,
are actually allowed in SGML. While examining this further, it turned
out SGML allows many things that would be considered extremely broken
HTML by common sense; and the HTML standard while *recommending* usage
only of a few widely accepted SGML constructs, doesn't really seem to
forbid anything allowed in SGML... At least the validator accepts them
-- though not one of the browsers I tested (not even Mozilla) behaves
the same way as the validator in all situations.

Is this really the right conclusion: Is everything that's legal in
SGML/accepted by the validator automaticaly valid HTML, even if this
allows creating perfectly valid HTML documents that no existing browser
will handle correctly? Are constructs like <--xx&<!-xx<<<hr> really
correct? What about things like unclosed tags and empty tags -- how
should a browser handle these?

Another interesting issue is error recovery. The validator for example
seems to immediately stop tag parsing and continue in content parsing
when encountering an illegal character, but skips misformed attribute
values or invalid declarations. Is there some kind of specification or
recommendation for such behaviour, or is the browser free to handle
syntax errors as it likes?

-Olaf-

PS. please CC:

-- 
Don't buy away your freedom -- GNU/Linux

Received on Thursday, 4 September 2003 05:35:20 UTC