Re: IRI and IDN support; libcurl blocked? (was: Re: libwww blocked)

On Thu, January 3, 2008 02:42, Martin Duerst wrote:
>
> At 21:13 08/01/02, Regis Boudin wrote:
>>
>>On Wed, January 2, 2008 11:02, Urs Holzer wrote:
>>> They want to prevent bots from harvesting email adresses and flooding
>>> publicly editable resoures such as wiki sites. Perhaps the people who
>>> write those bots use libwww (because it is really easy to use) and
>>> don't care about the user agent header.
>>
>>Actually, wouldn't it be better to try and move away from libwww ? It
>>looks pretty much unmaintained upstream, with the last official release
>>over 5 years old and the last CVS commit 13 months ago.
>>
>>>From a Debian perspective, only 4 packages still use libwww, and its
>>current maintainer announced his intention to ask for removal from the
>>archive. His advice is to switch to libcurl, which seems to provide a
>> much
>>nicer to use API and is actively maintained. Obviously I am willing to do
>>some work on this and provide patches. I might do it anyway for the
>> debian
>>package...
>
> Two points:
>
> 1) libcurl is also mentioned on
> http://johannburkard.de/blog/www/spam/The-top-10-spam-bot-user-agents-you-MUST-block-NOW.html,
> so we can expect it to produce the same 'refusal by server' problems
> as libwww.

So no loss, here... Actually, the user agent can be set by the program
using libcurl, so this problem would be solved using libcurl.

> 2) In the current Amaya libwww code base supports,
>    Internationalized Domain Names (IDN, this works at least
>    on some systems, not necessarily on all) and
>    IRIs (Internationalized Resource Identifiers, this should work on
>    all systems). libcurl didn't support these the last time I checked.

According to the changelog [1] IDN support seems to have been in since
April 2004 through the use of libidn. IRIs support is not, though.

[1] http://cool.haxx.se/cvs.cgi/curl/CHANGES.0?rev=1.3

>    However, a student of mine implemented these last year, and
>    I would be glad to contribute the code. It would be a pity if
>    switching from libwww to libcurl would remove support for IRIs
>    and IDNs.

>From what I've seen so far, it should be possible to have a conditional
build against either libwww or libcurl, which was my plan anyway. An on
the other hand, libcurl would provide https support among other things
which are still missing in libwww.

Regis

Received on Thursday, 3 January 2008 10:24:10 UTC