Re: XML empty-element syntax in SGML HTML documents

On Wed, 22 May 2002, Christopher R. Maden wrote:

> I was a bit startled to find an HTML 4.01 Transitional document passing the
> validator using <img /> syntax.
>
> I finally figured out why - it is valid SGML (of course).  However, it
> definitely doesn't mean what the author thought: it means an img tag ('<img
> /') followed by a greater-than in character data.
>
> I don't expect the SGML parser to catch this, however, it might be a good
> idea for the validator to flag any use of NET in a non-XML document.

This is something that's come up quite frequently on this list.

> There's a post at <URL:
> http://lists.w3.org/Archives/Public/www-validator/2002Feb/0151.html > which
> shows awareness of the issue; however, it's inaccurate.  The <link />
> syntax is *not* legal, as it dumps a > in character data inside the head,
> where it's not allowed.

Actually it's worse than that.  The character data implicitly closes
the HEAD and opend the BODY.  Leads to *very* confusing error reports,
and one of many reasons to prefer Strict over Legacy^H^HTransitional.

> I realize we can't turn SHORTTAG off,

Yes we can - OpenSP supports it as a warning ( -wunclosed on the
commandline).  You get that from the recommended parse mode of
Page Valet, or with Warnings enabled in the WDG validator.

-- 
Nick Kew

Available for contract work - Programming, Unix, Networking, Markup, etc.

Received on Thursday, 23 May 2002 14:11:45 UTC