Survey Results: What Matters to Web Data Collection Buyers

Survey Results: What Matters to Web Data Collection Buyers

While structured web data presents exciting possibilities in many fields of endeavor – including finance, cyber-security, artificial intelligence and more – the market for data extraction platforms is still fairly young. Only […]

Quick Guide to News APIs

Quick Guide to News APIs

Monitoring mass media has come a long way since the days of the press-cutting agency. The bulk of today’s news is published online, while modern technology lets us store, index and query […]

Why Extracting Content From The Open Web Is Better than Surveys for Research

Why Extracting Content From The Open Web Is Better than Surveys for Research

What’s the best way to find out how people feel about a given topic? Simply ask them, right? Well, at least that’s what we’ve been led to believe. Standard polling practice tells […]

Article’s publication date extractor – an overview

Article’s publication date extractor – an overview

A few days ago I’ve released an open source Python module that provides you with a simple way to extract and normalize the publication date of any online blog or news post. […]

Dead simple {for devs} python crawler (script) for extracting structured data from any  website into CSV

Dead simple {for devs} python crawler (script) for extracting structured data from any website into CSV

On my previous post I wrote about a very basic web crawler I wrote, that can randomly scour the web and mirror/download websites. Today I want to share with you a very simple […]

Crawling Horrors – Browser Scraping

Crawling Horrors – Browser Scraping

In my previous blog post, I wrote about RSS crawlers, and why they don’t really work. In this post I want to discuss the technique of using a headless browser to parse […]