Article’s publication date extractor – an overview

Article’s publication date extractor – an overview

A few days ago I’ve released an open source Python module that provides you with a simple way to extract and normalize the publication date of any online blog or news post. […]

How to Extract Data from a Website: 5 Steps to Transform Unstructured Data into Business Insights

How to Extract Data from a Website: 5 Steps to Transform Unstructured Data into Business Insights

Big data is big business. And for good reason. As Harvard Business Review recently reported, an exhaustive study of 330 North American companies led by the MIT Center for Digital Business in […]

Social Media Analytics: Insights from Structured versus Unstructured Data

Social Media Analytics: Insights from Structured versus Unstructured Data

Let’s be honest … social media is a challenge. Not only is staying current, active, and “topped off” a chore, but crafting full-scale campaigns that contribute to your business’ and brand’s actual […]

Ever imagined how "Big Data" looks like?

Ever imagined how "Big Data" looks like?

We have created a fun little experiment, letting you navigate in a 3D universe of real data from the open web. The data is made out of important news and blog titles, their meta-data […]

30-Days of Historical Data Access for Webz.io Now Available

30-Days of Historical Data Access for Webz.io Now Available

I’m very happy to let you know about the launch of our extended access to 30-days of historical data from Webz.io, which is available to our paying customers immediately. No waiting list. […]

To crawl or not to crawl, that is the question

In order to write an efficient crawler, you must be smart about the content you download. When your crawler downloads an HTML page it uses bandwidth, memory and CPU, not only its […]

Dead simple {for devs} python crawler (script) for extracting structured data from any  website into CSV

Dead simple {for devs} python crawler (script) for extracting structured data from any website into CSV

On my previous post I wrote about a very basic web crawler I wrote, that can randomly scour the web and mirror/download websites. Today I want to share with you a very simple […]

Tiny basic multi-threaded web crawler in Python

Tiny basic multi-threaded web crawler in Python

If you need a simple web crawler that will scour the web for a while to download random site’s content – this code is for you. Usage:

Where https://cnn.com is your seed site. It could […]

How we quadrupled the performance of Elasticsearch

How we quadrupled the performance of Elasticsearch

Well, that’s a misleading title. We actually quadrupled the performance of our brand monitoring alert system that uses Elasticsearch’s Percolator, but that would have been a much longer title. Some background Buzzilla […]

Webz.io Tip: Search for top performing (viral) posts

Webz.io Tip: Search for top performing (viral) posts

Here at Webz, our crawlers download millions of posts a day from millions of sources. When searching for web data among these many sources, you may want to limit your results to […]