>>> The spec describes what to do with every possible stream of input
>>> characters.
>> This seems like an unimaginably arrogant statement to me.  (Now you  
>> know why I said the above first.  :) )

I wonder why. The XML specification does the same. It just says that you  
have to abort processing when you hit a certain illegal character where  
the WHATWG HTML5 proposal for HTML parsing says you have to take action X  
when you hit a certain illegal character.

A typical state looks like something like the following:

   Space character
      Switch to state A.
      Parse error.
      Reconsume EOF in state B.
      Emit token.
      Switch to state B.
   Any other character.
      Append character to the name of the current token.
      Stay in this state.

I'm not sure I really see the issue.

Anne van Kesteren

