What is the Omgili Bot, and why is it Crawling Your Website?

Hi there. If you’re reading this, it’s probably because you’ve run into Omgilibot – perhaps in your web analytics or server logs (user agent: omgili/0.5 +https://omgili.com) – and turned to Google to decide whether this crawler is a benevolent creature that should be permitted to do as it will, or something more nefarious that deserves to be forever banished from your servers.

Well, you’ve come to the right place! This post will tell you everything you need to know about the Omgili Bot. But since this long-winded intro is already dwindling your attention, we’ll start with the bottom line:

Omgili is a white-hat web crawler that’s used by the world’s top companies and sends great traffic your way.

In a bit more detail – the Omgili Bot is a web crawler we developed a decade ago to power the (now discontinued) Omgili search engine.

Today this bot powers Webz.io, a web crawling service used by the world’s leading media monitors and research institutes, as well as thousands of developers.

By indexing your website, we are making it possible for services like Hootsuite, Sprinklr, and NetBase – all of which rely on Omgili’s crawled web data – to find relevant information in your site, link to it and send traffic your way. It also saves these companies the need to build their own crawlers, which would obviously further tax your site’s resources.

Omgili does its best to play nice.

The Omgili bot crawls your site efficiently and has been designed to minimize the resources it requires from your infrastructure. We have developers who dedicate their entire day to doing just this.

However, the occasional hiccup does happen – so if our bots are becoming resource-hogs and slowing down your site, please let us know and we’ll find a solution!

The Omgili Bot knows when it’s not wanted.

If you don’t like our bot hanging around your site, you can tell us directly or through your robots.txt file – and we’ll go our separate ways with no hard feelings (okay, maybe some hard feelings).

You can read more about blocking the Omgili bot here. We are dedicated to working with the websites we crawl in a mutually beneficial way, and always comply with these requests.

What websites does the Omgili Bot crawl?

Generally the bot will try to crawl anything it comes across, but we are focused on the following:

  • News sites
  • Blogs
  • Reviews
  • Ecommerce
  • Message boards and online discussions

If your site falls into one of these categories, you might encounter our bot. It’s friendly, so don’t hesitate to say hi!

What do we do with the data we collect?

After crawling your site, we index it and make it accessible via the Webz.io API, which is used by thousands of individuals, companies, research organization or government institute that wants to better understand the web.

Why you should let Omgili crawl your site

To recap:

  • Omgilibot connects your site to hundreds of apps, services and marketplaces such as Hootsuite and Sprinklr, which can then link back to you, potentially sending thousands of relevant visitors to your web properties
  • If you run advertisements on your site, being noticed and linked to by these services can increase your attractiveness to advertisers and the revenue your site generates.
  • Omgili is built to minimize resource usage from the websites it crawls
  • We are constantly improving Omgili and are open to any feedback you might have about the bot and the way it operates.

Learn more

Want to better understand why we crawl your site? Start your 10-day free trial today, or talk with a data expert to learn more about Webz.io’s solutions. 

Until then, have a wonderful 2018!

SPREAD THE NEWS

Subscribe to our newsletter for more news and updates!

By submitting you agree to Webz.io's Privacy Policy and further marketing communications.
Subscribe to our newsletter for more news and updates!

Ready to Explore Web Data at Scale?

Speak with a data expert to learn more about Webz.io’s solutions
Create your API account and get instant access to millions of web sources