XPath tips from the web scraping trenches from Melvin Carvalho on 2014-07-17 (public-webize@w3.org from July 2014)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Thu, 17 Jul 2014 18:33:16 +0200
To: public-webize@w3.org
Message-ID: <CAKaEYh+ZeC7tMANA_iitAhhdi+quXdDJfTa8_utUoa2m5tjdKQ@mail.gmail.com>

Quite an interesting article that talks about how to get data
systematically from existing pages.

Scrapy is quite a nice tool for doing this, and here's some do's and dont's

http://blog.scrapinghub.com/2014/07/17/xpath-tips-from-the-web-scraping-trenches/

This also ties into some of the openlink spongers / cartridges in terms of
transforming existing web content into structured data.

Received on Thursday, 17 July 2014 16:33:45 UTC