Re: An HTML language specification from Ian Hickson on 2008-11-24 (public-html@w3.org from November 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 24 Nov 2008 04:47:22 +0000 (UTC)
To: Jim Jewett <jimjjewett@gmail.com>
Cc: HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0811240436330.17414@hixie.dreamhostps.com>
On Sun, 23 Nov 2008, Jim Jewett wrote:
> Ian wrote:
> > Well, some of the conformance classes need the language syntax 
> > requirements.
> 
> I think everyone agrees that the processing requirements would rely on 
> the syntax specification -- the point is that the syntax specification 
> does not need to rely on the processing requirements.
> 
> But to be honest, I think part of the disagreement is that many people 
> (myself included) don't think the the processing requirements really 
> rely all that *heavily* on the syntax.

Some conformance classes (e.g. browsers) couldn't care less what is 
conforming and what isn't. Others (e.g. conformance checkers) care a great 
deal. So it depends on the conformance class.


> > For example, a WYSIWYG editor would need to know both the syntax and 
> > vocabulary conformacne requirments, to output valid documents, as well 
> > as the parsing and rendering requirements, to show the right output.
> 
> It would only need the parsing requirements if it imported existing 
> non-conformant HTML.

No, anything that parses HTML, even if it only parses compliant HTML, 
needs the parsing rules, since there are certain things that are 
surprising even with conforming HTML (e.g. how to determine whether a 
<script> block is in the <head> or the <body> when tags are omitted).

You also need the parsing rules to know whether something is conforming in 
the first place.


> It would clearly need the vocabulary and syntax requirements -- but for 
> an editor, that is domain knowledge; needed in the same way that a 
> baseball simulator would need to know the rules of baseball.

Indeed. This makes it all the more important that they be in the same 
document, not less important, IMHO.


> > Similarly, a conformance checker's implementation requirements are a 
> > combination of both the language conformance rules and some separate 
> > implementation conformance rules (e.g. the parsing rules).
> 
> It needs only the language conformance rules to say "valid" or "not 
> valid".

The definitions of what is valid and what isn't can be quite involved, but 
yes. So? This still means that the conformance checkers need information 
that also applies to authors and information that also applies to 
browsers. My point is just that this split isn't anywhere near as clean as 
you make it out to be.


> The (error-recovery portion of the) parsing rules would allow it to 
> recover more gracefully and continue to provide additional useful errors 
> on the same run -- but they aren't strictly required.

I'd really rather make the spec useful rather than try to do the strict 
minimum. After all, we already have validators, and they already implement 
the error recovery rules. I might be more sympathetic to your position 
here if we had any validators at all that didn't use the error-recovery 
rules.


Anyway, as previously noted, I do plan to create "views" of the spec that 
probably satisfy your desires. I think having a single core specification 
from which views can be generated is quite reasonable. Having multiple 
source documents, IMHO, would just be painful to edit, if nothing else. 
Either we'd end up having to have redundant requirements (and thus 
conflicts, when the requirements aren't quite phrased the same way), or 
we'd end up with cracks (where the requirements don't quite line up and we 
miss something), or we'd end up with documents that don't quite solve any 
actual problems and aren't quite useful to anyone.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 24 November 2008 04:48:03 UTC