Re: An HTML language specification from Ian Hickson on 2008-11-24 (public-html@w3.org from November 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 24 Nov 2008 22:56:21 +0000 (UTC)
To: Jim Jewett <jimjjewett@gmail.com>
Cc: HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0811242253530.17401@hixie.dreamhostps.com>

On Mon, 24 Nov 2008, Jim Jewett wrote:
> 
> Circling back to this -- it isn't clear why such omitted tags should be 
> conforming (as opposed to "accepted and corrected by full parsers").

It was valid in HTML4. Changing this without good reason seems like a bad 
idea, and there are a lot of authors who like being able to omit tags 
(e.g. because it can make the difference between their document fitting in 
one TCP packet or two, which can have a measurable effect on performance).

> The editor that doesn't import doesn't have to worry about omitted tags.

Most editors import. Otherwise they're creators, not editors. :-)

> And the editor that doesn't import invalid HTML doesn't have to worry 
> keeping a list of active formatting elements.

How can an editor know that it is important only valid HTML if it doesn't 
have the parsing rules to tell it what is valid?

> > The definitions of what is valid and what isn't can be quite involved, 
> > but yes. So?
> 
> For historical reasons, they are.  It isn't clear that they should be, 
> for static documents.

Well, if we were designing this from scratch, I'd agree, but sadly we have 
to deal with the legacy.

> >> The (error-recovery portion of the) parsing rules would allow it to 
> >> recover more gracefully and continue to provide additional useful 
> >> errors on the same run -- but they aren't strictly required.
> > 
> > I might be more sympathetic to your position here if we had any 
> > validators at all that didn't use the error-recovery rules.
> 
> They tend to be lightweight debugging tools, rather than published 
> products.  I'll agree that an internal testing tool doesn't *need* to be 
> fully conformant, but I see no reason to make that harder than it needs 
> to be.

Why wouldn't these tools just use off-the-shelf HTML parsers, just like 
they do with XML parsing?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 24 November 2008 22:57:04 UTC