Preprocessor supported protocols

Hello again :)

Links like :
<a href="mailto:myemail@host.com">
<a href="javascript:dosomething();">
<a href="skype:user?chat">
...

are followed because the method Preprocessor#hasSupportedProtocol
doesn't find the substring "://", so it thinks it is http.

For example, if you try it using
http://www2.paginasamarillas.es/sites/119/058/644/034/languages/ES/pda/home.html

you'll get this exception:
Exception in thread "main" java.lang.IllegalStateException: unsupported
protocol: 'javascript'
    at
org.apache.commons.httpclient.protocol.Protocol.lazyRegisterProtocol(Protocol.java:149)
    at
org.apache.commons.httpclient.protocol.Protocol.getProtocol(Protocol.java:117)
    at org.apache.commons.httpclient.HttpHost.<init>(HttpHost.java:107)
    at
org.apache.commons.httpclient.HttpMethodBase.setURI(HttpMethodBase.java:280)
    at
org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:220)
    at
org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
    at org.w3c.mwi.mobileok.basic.HTTPResource.<init>(HTTPResource.java:69)
    at org.w3c.mwi.mobileok.basic.HTTPResource.<init>(HTTPResource.java:60)
    at
org.w3c.mwi.mobileok.basic.HTTPTextResource.<init>(HTTPTextResource.java:37)
    at
org.w3c.mwi.mobileok.basic.Preprocessor.preprocess(Preprocessor.java:108)
    at
org.w3c.mwi.mobileok.basic.Tester.getPreprocessorResults(Tester.java:79)
    at org.w3c.mwi.mobileok.basic.Tester.main(Tester.java:134)

This is the current code:

private static boolean hasSupportedProtocol(URI uri) {
    final String uriString = uri.toString();
    // Either no protocol (assume http), or http or https specified;
ignore anything else
    return !uriString.contains("://") || uriString.startsWith("http://")
|| uriString.startsWith("https://");
}

I don't know what is the best way to solve it, maybe if we find ":"
without the two slashes, we should reject the URI.  Another "uglier" 
solution could be to catch the IllegalStateException
and continue.

Regards,
guido

Received on Monday, 3 December 2007 11:49:15 UTC