Re: HTML parser in Yacc form??? from uid#15033@dxal18.cern.ch on 1995-03-22 (www-talk@w3.org from March to April 1995)

From: <uid#15033@dxal18.cern.ch>
Date: Wed, 22 Mar 1995 18:47:47 +0900
To: hallam@dxal18.cern.ch (USENET), documen@cam.org (Ozgen Eryasa), www-talk@w3.org
Message-Id: <95Mar22.184801+0900_met.63660-2+26@dxal18.cern.ch>

In article <3k4hss$l06@stratus.CAM.ORG> you write:

|>	Hi all,
|>
|>	I was wondering if there exists a specification of HTML in yacc 
|>(or bnr) form. It has probably been done as constructing such a parser is 
|>way more easier in this way than with a traditional C subroutine.

Don't think about it. HTML is not an LR(1) grammar and so trying to use yacc
is only going to cause pain. The best way of parsing SGML is with a top down 
recursive descent parser. Try to use yacc and you will end up in all sorts of
troubles, especially with error reporting.

One of the problems with comp sci courses is that lecturers often make
silly statments such as bottom up parsing being somehow better than top down. 
This is not the case. Bottom up parsers can be made slightly faster but at
a disproportionate cost in terms of complexity. My view is that a language 
requiring a yacc parser is probably too complex in any case. Nobody uses
an LR(1) parser to parse LISP.

--
Phillip M. Hallam-Baker

Not Speaking for anyone else.

Received on Wednesday, 22 March 1995 12:48:51 UTC