html tidy's parsing strategy from Ross Boylan on 2008-02-24 (html-tidy@w3.org from January to March 2008)

From: Ross Boylan <ross@biostat.ucsf.edu>
Date: Sun, 24 Feb 2008 10:33:25 -0800
To: html-tidy@w3.org
Cc: ross@biostat.ucsf.edu
Message-Id: <1203878005.5986.19.camel@corn.betterworld.us>

I'm curious what strategy tidy uses to parse and correct input with
errors.  I'm interested both because I want to use tidy with some
modifications, and because I have some input that fits this general
description and I'm not familiar with how to approach the problem.

I have the source, but there's a lot of it, so I'm hoping for some help.

Thanks.
Ross Boylan

P.S. I notice the tidylib interface is "very rough" according to the
comments.  Is it serviceable?

Received on Sunday, 24 February 2008 19:29:31 UTC