- From: Gavin Nicol <gtn@ebt.com>
- Date: Thu, 23 Mar 1995 12:37:23 -0500
- To: uid#15033@dxal18.cern.ch
- Cc: www-talk@www10.w3.org
>|> Hi all, >|> >|> I was wondering if there exists a specification of HTML in yacc >|>(or bnr) form. It has probably been done as constructing such a parser is >|>way more easier in this way than with a traditional C subroutine. > >Don't think about it. HTML is not an LR(1) grammar and so trying to use yacc >is only going to cause pain. The best way of parsing SGML is with a top down >recursive descent parser. Try to use yacc and you will end up in all sorts of >troubles, especially with error reporting. Phill is technically correct (that one cannot parse SGML and hence HTML using YACC et al). If one limits oneself to a subset of SGML, it is quite possible to produce a YACC grammer. Dan Connolly has produced such a grammar for HTML by hacking DTD2HTML, and the TEI folks have produced an *excellent* and very *useful* subset of SGML, and the grammar is available at: ftp://ftp-tei.uic.edu/pub/TEI While these can accept come documents that are not quite legal SGML, 99.9% of documents I've seen would be both legal withing the TEI grammar, and within SGML.
Received on Thursday, 23 March 1995 12:34:52 UTC