Re: Preprocessor supported protocols

Thanks again Guido -- yup that is a bug that I introduced with my last
change. I think the right logic is:

If there is no colon in the URI, assume it is an "http://" URI and
proceed that way.
If there is a colon, parse the part before the colon and use that as
the protocol. If it's "http" or "https", continue.

I will fix it. Thanks as always for your continued testing; it's a
very valuable contribution.

Sean

On Dec 3, 2007 6:46 AM, Guido García Bernardo <ggarciab@itdeusto.com> wrote:
>
> Hello again :)
>
> Links like :
> <a href="mailto:myemail@host.com">
> <a href="javascript:dosomething();">
> <a href="skype:user?chat">
> ...
>
> are followed because the method Preprocessor#hasSupportedProtocol
> doesn't find the substring "://", so it thinks it is http.
>
> For example, if you try it using
> http://www2.paginasamarillas.es/sites/119/058/644/034/languages/ES/pda/home.html
>
> you'll get this exception:
> Exception in thread "main" java.lang.IllegalStateException: unsupported
> protocol: 'javascript'
>     at
> org.apache.commons.httpclient.protocol.Protocol.lazyRegisterProtocol(Protocol.java:149)
>     at
> org.apache.commons.httpclient.protocol.Protocol.getProtocol(Protocol.java:117)
>     at org.apache.commons.httpclient.HttpHost.<init>(HttpHost.java:107)
>     at
> org.apache.commons.httpclient.HttpMethodBase.setURI(HttpMethodBase.java:280)
>     at
> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:220)
>     at
> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
>     at org.w3c.mwi.mobileok.basic.HTTPResource.<init>(HTTPResource.java:69)
>     at org.w3c.mwi.mobileok.basic.HTTPResource.<init>(HTTPResource.java:60)
>     at
> org.w3c.mwi.mobileok.basic.HTTPTextResource.<init>(HTTPTextResource.java:37)
>     at
> org.w3c.mwi.mobileok.basic.Preprocessor.preprocess(Preprocessor.java:108)
>     at
> org.w3c.mwi.mobileok.basic.Tester.getPreprocessorResults(Tester.java:79)
>     at org.w3c.mwi.mobileok.basic.Tester.main(Tester.java:134)
>
> This is the current code:
>
> private static boolean hasSupportedProtocol(URI uri) {
>     final String uriString = uri.toString();
>     // Either no protocol (assume http), or http or https specified;
> ignore anything else
>     return !uriString.contains("://") || uriString.startsWith("http://")
> || uriString.startsWith("https://");
> }
>
> I don't know what is the best way to solve it, maybe if we find ":"
> without the two slashes, we should reject the URI.  Another "uglier"
> solution could be to catch the IllegalStateException
> and continue.
>
> Regards,
> guido
>
>
>

Received on Tuesday, 4 December 2007 00:49:40 UTC