Re: using mobileok-checker

Hi Matt,

In the beginning we tried to use websphinx 
(http://sourceforge.net/projects/websphinx/), but at the end we 
developed our own crawler, as websphinx lacks of a way to customize HTTP 
request headers.

Our crawler is intended for experimental purposes, so it is simply a 
Java BlockingQueue where a set of Threads read and write URIs to be 
crawled.  Not scalable at all.

I suppose there are other opensource crawlers...

Regards,
guido.

Matt Maths escribió:
> Hi Guido
>  
> Do you have a recommendation for a crawling open source project that 
> could be used to feed to the moki file? Did your team write your own, 
> or did you use an open source project.
>  
> I would apprecaite your/other member's feedback
>  
> thanks
>  
> Matt
>
> ----- Original Message ----
> From: Matt Maths <mattmobi@yahoo.com>
> To: public-mobileok-checker@w3.org
> Sent: Tuesday, October 7, 2008 8:46:26 AM
> Subject: Re: using mobileok-checker
>
>
>
>  
> Thanks Guido and Abel.
>  
> I will start experimenting with the code and if stuck, ask questions 
> on this forum.
>  
> thanks again for detailed replies
>  
> Matt
>
> ----- Original Message ----
> From: Guido García Bernardo <ggarciab@oesia.com>
> To: Matt Maths <mattmobi@yahoo.com>
> Cc: "public-mobileok-checker@w3.org" <public-mobileok-checker@w3..org>
> Sent: Tuesday, October 7, 2008 12:51:32 AM
> Subject: Re: using mobileok-checker
>
>
> Hi Matt,
>
> I am not member of the mobileok development team, so you can consider my
> point of view as objective. In my project, we have been using mobileok
> checker for quite some time, introducing some changes in order to add
> our own XSL tests.
>
> a. It is stable enough since the alpha release (one year ago).  In
> earlier versions we had some problems (with the 0.01% of the pages)
> because some pages were cauisng the checker to throw an unexpected
> exception and fail.
> In my opinion, the code is also readable and understandable.  That makes
> your development a lot easier.
>
> b. The support in this forum has been highly satisfactory in my case,
> with Sean Owen and Dom (among others) answering questions and even
> solving those problems above in less than 24 hours.
>
> c. We also needed to check each linked page, so we modified mobileok to
> define some non-visible classes and methods as public (i.e. Preprocessor
> class) to be called from our code (as the
> org.w3c.mwi.mobileok.basic.Tester class does) :
>
>     mobileok <- third party application
>
> In my opinion, this is the major drawback for third party projects to
> interact with the "mobileok core".  Now, I think a better approach would
> have been to keep mobileOk as it is, and use the moki XML files to
> interact with it. Maybe this approach is easier to mantain in the long 
> run :
>
>     mobileok -> moki <- third party application
>
> If you look for the message "Some thougths about mobileok checker"
> (2007/10/15), Sean Owen said :
>
> In general I'm reluctant to open up classes and methods to public 
> access that don't need to be public. The code was not necessarily 
> written to be extended at the code level, since it defines a specific 
> behavior of tests and should not need to be changed regularly by third 
> parties. One can use the "moki" XML output of the Tester, which 
> contains all information about the document, and build another 
> application on top of that.
>
>
> The moki file contains the links in the page, so you can easily extract
> and check them.  From my experience, if you plan to in-depth crawl a lot
> of pages, think about using a thread pool and about an effective way to
> store the pending links (those to be checked) and the already crawled
> ones, as the number of links tend to grow exponentially and you could
> find some issues if you keep them in memory.
>
> Hope that helps,
> guido.
>
> Matt Maths escribió:
> > Hi
> >
> > I would appreciate some help. We are looking to build an internal 
> validation tool that automatically validates our mobile content on 
> test servers... The kind of information we want to validate is
> > xhtml compliance
> > profanity
> > spell check
> > reference to external links
> > redirects
> > some basic content compliance (such as headers and footers)
> > and some other tests......
> >
> > I wanted to find out
> > a. Is using mobileok-checker mature enough for us ( a third party) 
> to build a test tool based on it?
> > b. Can we expect some support from this forum, such as if we run a 
> implementation idea here or if we are getting an error?
> > c. We would like to implement a tool, that not only tests the main 
> page of the site, but also bots down to all links and checks each page 
> (i.e. runs tests on each page). What is the best way to achieve this?
> >
> > I would appreciate if you can answer my questions so that we can 
> make right decisions
> >
> > thanks
> >
> > Matt
>
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 

Received on Tuesday, 21 October 2008 15:38:53 UTC