What is the Omgili Bot, and why is it Crawling Your Website?

What is the Omgili Bot, and why is it Crawling Your Website?

Hi there. If you’re reading this, it’s probably because you’ve run into Omgilibot – perhaps in your web analytics or server logs (user agent: omgili/0.5 +https://omgili.com) – and turned to Google to…

How to Extract Data from Websites: Scraping Tools, DIY or DaaS

How to Extract Data from Websites: Scraping Tools, DIY or DaaS

This is part 2 of our guide to web data extraction. Read part 1 to learn about the questions to ask before you start, or download the complete Web Data Extraction Playbook…

Web Data Extraction Guide: 11 Questions to Ask

Web Data Extraction Guide: 11 Questions to Ask

The following is an excerpt from our new Web Data Extraction Playbook. We’ll be publishing the second part next week, or you can grab the full guide here. The internet has become…

A Judge Just Ordered LinkedIn to Allow Scraping – Here's Why

A Judge Just Ordered LinkedIn to Allow Scraping – Here's Why

When is it okay to grab data from someone else’s website, without their explicit permission? A new ruling by a federal judge in California might have dramatic implications on this question, and…

The Hackathon Award for Best API Mashup Goes to…

The Hackathon Award for Best API Mashup Goes to…

Competitive programming competitions, commonly referred to as Hackathons, offer a great opportunity for new talent to show what they can do. Much like professional sports, industry leaders send recruiters to scout out…

Webz.io API Featured in New Guide to Web Development with Django

Webz.io API Featured in New Guide to Web Development with Django

Last February, co-authors Leiff Azopardi and James Maxwell completed the latest edition of their book Tango with Django. It presents an excellent step-by-step approach to learning Python on the popular Django framework…

How to Use Online Review Ratings to Crush the Market

How to Use Online Review Ratings to Crush the Market

Sifting through millions of posts on review sites presents both a massive undertaking and an incredible opportunity for influencer marketing. Some of the most successful app makers are capitalizing on that oppotunity. Use…

How to use rated reviews for sentiment classification

How to use rated reviews for sentiment classification

Sentiment classification is a fascinating use case for machine learning. Regardless of complexity – you need two core components to deliver meaningful results; a machine learning engine and a significant volume of…

How to access, cite, and defend web datasets in academic research

How to access, cite, and defend web datasets in academic research

We’re used to getting questions about accessing structured web data. But recently, we’ve been fielding a different kind of use case.  Researchers and scientists have been asking about data citation conventions and how…

Should you buy crawled web data or build your own solution?

Should you buy crawled web data or build your own solution?

In a technologically driven environment, the temptation to develop a proprietary web crawling solution is virtually irresistible. Our latest report examines the true cost of computing and software development resources required to deliver a data…

The Race to Achieve 100% Coverage of the Web

The Race to Achieve 100% Coverage of the Web

In our new report, we deconstruct the all-too-familiar race to achieve 100% coverage of the web. Data acquisition efforts usually rely on one of three approaches – build an internal web crawling…

5 Ways to Measure the Impact of Crawled Web Data on Your Business

5 Ways to Measure the Impact of Crawled Web Data on Your Business

The analysis you provide is only as good as the raw data you start with. Although data from the open web is often perceived as a commodity, not all crawled data is…