- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Thu, 10 Jun 2010 11:52:53 +0200
- To: Ville Skyttä <ville.skytta@iki.fi>
- Cc: public-qa-dev@w3.org
Le mercredi 09 juin 2010 à 21:49 +0300, Ville Skyttä a écrit : > > * is there an open bug matching this problem in our own bugzilla? I had > > a quick look and didn't find one, but it might be hidden into another > > bug report; if you think there is none, I'll create one > > I don't remember if there's a bug report about this in Bugzilla. But there is > at least this: http://lists.w3.org/Archives/Public/www- > validator/2010Mar/0019.html Thanks; I've created a bug in bugzilla to document the situation: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9899 At the very least, it can be used as pointer for people asking what's going on in www-validator. > Whenever there's a parse error, XML::LibXML gives us a chain of errors. This > chain is initially pointed at the last one in the chain, which often does not > convey much at all about the actual problem. We need to iterate the chain > using $error->_prev() to get to the start of the chain where usually the > actual error causing the rest of the chained ones is at. > > Now, version 1.69 of XML::LibXML fails to provide the entire chain (I don't > remember if it's always or only in some cases) and we get only the "tail" of > it which leads to very confusing error messages like in the above mailing list > message. > > Version 1.70 on the other hand does provide the chain, but there are some > cases that trigger extreme slowness (I gather) at the time it internally > constructs the chain. I hadn't managed to analyse it in these terms, but that seems indeed to match what I see when using the perl debugger on the said pages. > A lot of these errors in practical validator use are due to undefined > entities, because we don't let XML::LibXML to fetch external entities. We > don't let it do that because letting it do so would cause a lot of entity/DTD > fetching, and a potential security issue. We could tell it to use XML > catalogs [0] to get around the first problem; that works and works around the > slowness issue in the most usual cases, but after that there's still the > security issue to tackle: XML::LibXML does not have an easy to use option that > we could use to "jail" it into a specific dir or set of dirs which means it > could be tricked to load things it shouldn't as external entities [1] [2].. Woulnd't the XML Parser option of "ext_ent_handler" be a way to do that jailing? http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Parser.pod#ext_ent_handler The code example there seems to suggest just that. Thanks a lot for all your insights on this problem, and for taking the time to document it so well here! Dom
Received on Thursday, 10 June 2010 09:53:07 UTC