W3C home > Mailing lists > Public > www-html@w3.org > February 2001

Re: HTML token

From: Russell O'Connor <roconnor@math.berkeley.edu>
Date: Tue, 27 Feb 2001 12:32:34 -0800 (PST)
To: Irawan Tanudirdjo <irwtan@yahoo.com>
cc: W3C HTML <www-html@w3.org>
Message-ID: <Pine.SOL.4.32.0102271226430.28204-100000@pub-708c-7>
On Tue, 27 Feb 2001, Irawan Tanudirdjo wrote:

> Shallom,
>
> I'm an undergraduate student of Computer Science
> from Surabaya, Indonesia.
>
> Right now, I'm having a compiler class project to
> create a HTML interpreter and viewer. So, I would
> like to ask about HTML tokens, lexemes, regular
> expression and grammar.
>
> Could anyone help me point out in the web, the
> documentation that contains the above specification?

There is no such documentation on the web that I know of.  I suggested
getting a copy of "The SGML handbook" by Charles F. Goldfarb.  It should
contain everything you need to know to write an SGML, and hence HTML 4.0
parser.

I should point out that there is a good chance that HTML 4.0 is not
defined by a context free grammer.

For XHTML, you can read "The Annotated XML Specification" at
<http://www.xml.com/pub/a/axml/axmlintro.html>.

-- 
Russell O'Connor                        roconnor@alumni.uwaterloo.ca
           <http://www.math.berkeley.edu/~roconnor/>
``Paradoxically, a refusal to `put a monetary value on life' means that
life is often undervalued.'' -- Artificial Intelligence: A Modern Approach
Received on Tuesday, 27 February 2001 15:32:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:45 GMT