W3C home > Mailing lists > Public > www-validator@w3.org > September 2003

Re: SGML and HTML

From: Nick Kew <nick@webthing.com>
Date: Mon, 29 Sep 2003 19:01:27 +0100 (BST)
To: <olafBuddenhagen@web.de>
Cc: <www-validator@w3.org>, <antrik@gmx.net>
Message-ID: <20030929184228.M1191-100000@fenris.webthing.com>

On Mon, 29 Sep 2003 olafBuddenhagen@web.de wrote:


> Hi,
>
> On Mon, Sep 08, 2003 at 03:48:14AM +0200, Bjoern Hoehrmann wrote:
> > >What I really meant to ask: Can a HTML document be called "correct"
> > >(without assigning any specific technical meaning to that term), if
> > >it's formally valid, but doesn't follow the recommendations about
> > >SGML usage mentioned in the standard?...
> >
> > Depends on the definition of "correct" here...
>
> I'll say as much as: "Correct" by common sense...

That's still too vague to answer.  If I were to say either yes or no,
there's a post at lists.w3.org just waiting to bite "... but you said..."

> > >Or on a more practical view: Should a browser, in this situation we
> > >are in, try to implement as much of SGML as possible, even if nobody
> > >can use it anyways?
> >
> > If as much as possible means to ensure that web sites which "work" in
> > competitor's browsers do not break in your browsers: maybe.
>
> I was thinking of: As much as possible without having to bloat the
> browser considerably. Net mode is OK, empty tags are acceptable. Missing
> start tags are bad, but probably can be worked around without really
> implementing them. (Which every browser does more or less anyways...)
> Other SGML features would be too complicated.

A browser can use a full SGML parser (OpenSP licensing terms allow it)
or a pragmatic HTML parser such as those from libxml2 or tidy.  The
latter is presumably what you had in mind, and isn't so far from
what browsers really do.

> > However, I don't think there is much that can be implemented without
> > breaking anything and after all, you cannot devolp a conforming HTML
> > 4.01 user agent and still support XHTML 1.0,

Huh????  Is this a reference to difficulties with Appendix C?
The ambiguous use of "text/html" does indeed complicate matters,
but certainly doesn't prevent a user agent supporting both.

> They aren't standard violations however, if ignoring recommendations is
> not considered a violation.
>
> The question really is only whether it would be too confusing to call it
> an "HTML error" or something the like...

There are very practical reasons why it's very hard to report many of
them as anything other than errors if we report them at all.

> > www-html, www-html-editor.

Nope.  XHTML doesn't have the kind of divergence between the spec
and "real life" that cause the worst validation headaches.  And the
HTML WG won't take an interest in HTML without an X.

> > >But I've still no idea why an SGML parser will accept <hr/> for
> > >example -- according to the BNF productions (the only part of the
> > >standard I could find on the web), a net-enabling start tag is never
> > >explicitely closed, so should not the > be treated as content?...
> >
> > Yes. That's what the Validator does.
>
> In this case, how can the claim be upheld that XML is a compatible
> subset of SGML, and can be parsed by any SGML processor?...

No problem.  XML is a subset that doesn't permit net-enabling start tags.

> > See <http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html>.

Hmmm, can't get dialup ATM:-(

> I knew this decision already, only I didn't understand it... Only now I
> realized that as long a the dominating browser doesn't support XHTML,
> there is really no other choice :-(

Well, you can have the be^H^Hworst of both worlds with Appendix C:-)

-- 
Nick Kew

In urgent need of paying work - see http://www.webthing.com/~nick/cv.html
Received on Monday, 29 September 2003 20:57:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:09 GMT