[whatwg] Parsing: Tokenisation - DOCTYPE State from Lachlan Hunt on 2006-01-29 (public-whatwg-archive@w3.org from January 2006)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Sun, 29 Jan 2006 17:59:33 +1100
Message-ID: <43DC67D5.80203@lachy.id.au>

Hi,
   I believe there are some mistakes in the DOCTYPE state section.

As far as I can tell both of these DOCTYPEs are considered conformant, 
but shouldn't the first be an easy parse error?

   <!DOCTYPEhtml>
   <!DOCTYPE html>

In the DOCTYPE state, it says:

U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z
     Create a new DOCTYPE token. Set the token's name name to the
     uppercase version of the current input character (*add 0x0020
     to the character's codepoint*), and mark it as being in error.
     Switch to the DOCTYPE name state.

* That should read "[subtract] 0x0020 to the character's codepoint"
   (This error is repeated in the DOCTYPE name state too.)

* Why is it marked as being error at that stage?  It doesn't seem to
   be necessary because of the last step in the DOCTYPE name state that
   says:
   "If the name of the DOCTYPE token is exactly the four letters "HTML",
    then mark the token as being correct. Otherwise, mark it as being in
    error."

-- 
Lachlan Hunt
http://lachy.id.au/

Received on Saturday, 28 January 2006 22:59:33 UTC