Crawling Through w3c Validator

Dear Validator List.

I posted this message once when I hadn't joined the list. It seems like it didn't got through my mail server så just in case I'll post it right to the list now that I have joined. Please forgive an eventually double posting.

------------

I am currently working on a project which intent to crawl a huge number of domains (all .dk domains) and do a test check on some random pages within that domain to see if they are using valid HTML. The result should be a searchable database indikating which domains are using valid code and some other info.

The database is going to be a foundation for at danish website I'm constructing about using valid HTML code and the advantages of it. It' will contain links, information and articles about valid HTML coding a.s.o. The database will provide statistic information about the current state of danish websites. My hope is that it will be possible to do more than one crawl of the sites during time, however at the moment I'm only trying to get the first crawl.

I realise that sending this amount of pages through the online w3c validator using the crawler I have build maybe will have influence on the online service. I have tried to install the validator locally on a win2k server, however not being a perl guy it gives me a rather large amount of trouble. I'm normally doing "light" asp.net programming and a little .Net Windows programming for the crawler.

My question is now, will it be possible (legal?) for me to go through the online validator? If not, does anyone have a suggestion for me how I can do a local install on a win2k server not being a great perl wiz, or maybe you know some way to actually integrate the validator into my windows app using vb.net or something similar?

I have tried the TidyCOM which works marvelous well in my app, however the program seems more designed for changing inputcode, than just plain and simple validating it, so I discarded it after a couple of try's because it didn't seem to warn about certain errors, but just fix them. Does there exist something like a W3CValidatorCOM or similar?

I hope some of you are able to help me, in the interest of educating people about w3c standards, and maybe can point me in a direction for solving my problem.

Yours Faithfully
Oscar Eg Gensmann
-- 
Oscar@Gensmann.com

Received on Monday, 8 April 2002 09:36:51 UTC