- From: Guido García Bernardo <ggarciab@oesia.com>
- Date: Tue, 21 Oct 2008 17:38:11 +0200
- To: Matt Maths <mattmobi@yahoo.com>
- CC: "public-mobileok-checker@w3.org" <public-mobileok-checker@w3.org>
Hi Matt, In the beginning we tried to use websphinx (http://sourceforge.net/projects/websphinx/), but at the end we developed our own crawler, as websphinx lacks of a way to customize HTTP request headers. Our crawler is intended for experimental purposes, so it is simply a Java BlockingQueue where a set of Threads read and write URIs to be crawled. Not scalable at all. I suppose there are other opensource crawlers... Regards, guido. Matt Maths escribió: > Hi Guido > > Do you have a recommendation for a crawling open source project that > could be used to feed to the moki file? Did your team write your own, > or did you use an open source project. > > I would apprecaite your/other member's feedback > > thanks > > Matt > > ----- Original Message ---- > From: Matt Maths <mattmobi@yahoo.com> > To: public-mobileok-checker@w3.org > Sent: Tuesday, October 7, 2008 8:46:26 AM > Subject: Re: using mobileok-checker > > > > > Thanks Guido and Abel. > > I will start experimenting with the code and if stuck, ask questions > on this forum. > > thanks again for detailed replies > > Matt > > ----- Original Message ---- > From: Guido García Bernardo <ggarciab@oesia.com> > To: Matt Maths <mattmobi@yahoo.com> > Cc: "public-mobileok-checker@w3.org" <public-mobileok-checker@w3..org> > Sent: Tuesday, October 7, 2008 12:51:32 AM > Subject: Re: using mobileok-checker > > > Hi Matt, > > I am not member of the mobileok development team, so you can consider my > point of view as objective. In my project, we have been using mobileok > checker for quite some time, introducing some changes in order to add > our own XSL tests. > > a. It is stable enough since the alpha release (one year ago). In > earlier versions we had some problems (with the 0.01% of the pages) > because some pages were cauisng the checker to throw an unexpected > exception and fail. > In my opinion, the code is also readable and understandable. That makes > your development a lot easier. > > b. The support in this forum has been highly satisfactory in my case, > with Sean Owen and Dom (among others) answering questions and even > solving those problems above in less than 24 hours. > > c. We also needed to check each linked page, so we modified mobileok to > define some non-visible classes and methods as public (i.e. Preprocessor > class) to be called from our code (as the > org.w3c.mwi.mobileok.basic.Tester class does) : > > mobileok <- third party application > > In my opinion, this is the major drawback for third party projects to > interact with the "mobileok core". Now, I think a better approach would > have been to keep mobileOk as it is, and use the moki XML files to > interact with it. Maybe this approach is easier to mantain in the long > run : > > mobileok -> moki <- third party application > > If you look for the message "Some thougths about mobileok checker" > (2007/10/15), Sean Owen said : > > In general I'm reluctant to open up classes and methods to public > access that don't need to be public. The code was not necessarily > written to be extended at the code level, since it defines a specific > behavior of tests and should not need to be changed regularly by third > parties. One can use the "moki" XML output of the Tester, which > contains all information about the document, and build another > application on top of that. > > > The moki file contains the links in the page, so you can easily extract > and check them. From my experience, if you plan to in-depth crawl a lot > of pages, think about using a thread pool and about an effective way to > store the pending links (those to be checked) and the already crawled > ones, as the number of links tend to grow exponentially and you could > find some issues if you keep them in memory. > > Hope that helps, > guido. > > Matt Maths escribió: > > Hi > > > > I would appreciate some help. We are looking to build an internal > validation tool that automatically validates our mobile content on > test servers... The kind of information we want to validate is > > xhtml compliance > > profanity > > spell check > > reference to external links > > redirects > > some basic content compliance (such as headers and footers) > > and some other tests...... > > > > I wanted to find out > > a. Is using mobileok-checker mature enough for us ( a third party) > to build a test tool based on it? > > b. Can we expect some support from this forum, such as if we run a > implementation idea here or if we are getting an error? > > c. We would like to implement a tool, that not only tests the main > page of the site, but also bots down to all links and checks each page > (i.e. runs tests on each page). What is the best way to achieve this? > > > > I would appreciate if you can answer my questions so that we can > make right decisions > > > > thanks > > > > Matt > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com
Received on Tuesday, 21 October 2008 15:38:53 UTC