- From: John Cowan <cowan@ccil.org>
- Date: Fri, 13 Nov 2009 11:35:25 -0500
- To: James Graham <jgraham@opera.com>
- Cc: John Cowan <cowan@ccil.org>, Henri Sivonen <hsivonen@iki.fi>, HTML WG <public-html@w3.org>
James Graham scripsit: > I would be interested in seeing that, if you can dig up some kind of > reference. http://conferences.idealliance.org/extreme/html/2004/Siefkes01/EML2004Siefkes01.html is the paper. It's a two-pass algorithm, but schemaless. > Note that a requirement is that the algorithm not need to use lookahead; > it must be possible to implement an incremental, error handling, parser. Yes, if you need to be both streaming and schemaless, Anne's version is probably the best you can do. My TagSoup library is streaming, but requires a schema (it's distributed with an HTML schema). Unfortunately, the schema language is a one-off, because standard XML schema languages don't provide the necessary information like entity declarations and default element parents (if the first tag you see is an LI, what should you interpolate in front of it? Answer: HTML, BODY, UL in that order). -- It was impossible to inveigle John Cowan <cowan@ccil.org> Georg Wilhelm Friedrich Hegel http://www.ccil.org/~cowan Into offering the slightest apology For his Phenomenology. --W. H. Auden, from "People" (1953)
Received on Friday, 13 November 2009 16:36:16 UTC