- From: Ted Guild <ted@w3.org>
- Date: Tue, 16 Jun 2009 13:46:00 -0400
- To: Tymon Wiedemair <tymon.wiedemair@gmail.com>
- Cc: site-comments@w3.org
Tymon Wiedemair <tymon.wiedemair@gmail.com> writes: > I have a Java XML parser in place to parse xml/xhtml documents. The > xml documents refrence a DTD hosted by you. > Since a few days I always get back a HTTP 503 error code from your > site. The document loads but still there is this 503 error which > causes trouble in the parser. We are sending HTTP 503 and the content of the response also includes a link which expands to an article giving more background on this issue. http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic In the last 16 months since writing that article we have only seen this traffic increase and recently we are seeing surges in traffic that we cannot keep up with, neither our automated defenses nor manual intervention. Increasing server capacity sees the increased capacity just getting consumed as well. This is rendering our site overwhelmed and unresponsive for our working groups and the rest of the web community. Some IP addresses we firewall temporarily due to the volume. That happens automatically and are cleared after a few days. About 1/4th of our DTD traffic (in the hundreds of millions/day) is from Java so when trying to keep our site available yesterday responding 503 to this traffic was low hanging fruit. We will be monitoring this traffic and see when we can be less dramatic in our defenses. We have relaxed our blocking of datatypes.dtd but do note depending on volume access may still be blocked so use a cache or catalog. We have also identified another widely distributed application responsible for a substantial portion of this traffic, the vendor has acknowledged the issue and is working on a resolution which we hope will be released soon. Many libraries have catalog or caching options and lacking that one can get a caching proxy in front of their application making repeated DTD requests. You should also see a pronounced performance improvement in using a catalog or cache instead of repeatedly going over the internet for these DTD resources. For Java Glassfish is an option: http://norman.walsh.name/2007/09/07/treadLightly and apparently if using Apache libraries there is a catalog solution in it as well as mentioned in this article. http://nwalsh.com/docs/articles/xml2003/ Without touching the code you can setup a caching proxy (eg Squid) and request the DTD resources through it from a user-agent other than one that vaguely identifies itself as Java. Your application[s] will then reference the local resource from the proxy. -- Ted Guild <ted@w3.org> W3C Systems Team http://www.w3.org
Received on Tuesday, 16 June 2009 17:46:07 UTC