Re: An HTML language specification

On Mon, 24 Nov 2008, Jim Jewett wrote:
> 
> Circling back to this -- it isn't clear why such omitted tags should be 
> conforming (as opposed to "accepted and corrected by full parsers").

It was valid in HTML4. Changing this without good reason seems like a bad 
idea, and there are a lot of authors who like being able to omit tags 
(e.g. because it can make the difference between their document fitting in 
one TCP packet or two, which can have a measurable effect on performance).


> The editor that doesn't import doesn't have to worry about omitted tags.

Most editors import. Otherwise they're creators, not editors. :-)


> And the editor that doesn't import invalid HTML doesn't have to worry 
> keeping a list of active formatting elements.

How can an editor know that it is important only valid HTML if it doesn't 
have the parsing rules to tell it what is valid?


> > The definitions of what is valid and what isn't can be quite involved, 
> > but yes. So?
> 
> For historical reasons, they are.  It isn't clear that they should be, 
> for static documents.

Well, if we were designing this from scratch, I'd agree, but sadly we have 
to deal with the legacy.


> >> The (error-recovery portion of the) parsing rules would allow it to 
> >> recover more gracefully and continue to provide additional useful 
> >> errors on the same run -- but they aren't strictly required.
> > 
> > I might be more sympathetic to your position here if we had any 
> > validators at all that didn't use the error-recovery rules.
> 
> They tend to be lightweight debugging tools, rather than published 
> products.  I'll agree that an internal testing tool doesn't *need* to be 
> fully conformant, but I see no reason to make that harder than it needs 
> to be.

Why wouldn't these tools just use off-the-shelf HTML parsers, just like 
they do with XML parsing?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 24 November 2008 22:57:04 UTC