The reason I started Webz.io is because I experienced the difficulties in collecting web data at scale when I worked on a previous project named PRTrack.it. At PRTrack.it we wanted to create...
How to Spot Fake Reviews in Time for the Holidays Black Friday is here, and as the biggest shopping day of the year, it means a lot of people will be on...
Wait let me explain. I can explain every part of this click-bait title, it will make sense I promise. So, A great philosopher named Homer Simpsons once said: “Trying is the first...
Kimono Labs made an announcement today that it has been acquired by Palantir. Unfortunately Kimono Labs users will only have two weeks to migrate to a different service because the team will...
A few days ago I’ve released an open source Python module that provides you with a simple way to extract and normalize the publication date of any online blog or news post....
We have created a fun little experiment, letting you navigate in a 3D universe of real data from the open web. The data is made out of important news and blog titles, their meta-data...
I’m very happy to let you know about the launch of our extended access to 30-days of historical data from Webz.io, which is available to our paying customers immediately. No waiting list....
In order to write an efficient crawler, you must be smart about the content you download. When your crawler downloads an HTML page it uses bandwidth, memory and CPU, not only its...
On my previous post I wrote about a very basic web crawler I wrote, that can randomly scour the web and mirror/download websites. Today I want to share with you a very simple...
Ready to Explore Web Data at Scale?
Create your API account and get instant access to millions of web
sources