Your Data for AI Doesn’t Have to Be Big or Small. It Has to Be Juuuuust Right

High Quality Data for Enterprise and SME Alike

With AI funding in the US almost doubling in the last two years, many organizations have been jumping on the AI bandwagon. But before these organizations even consider developing their AI models and algorithms, they need access to large amounts of high-quality, flexible data that can be structured. Whether it’s through our firehose solution for the enterprise or free datasets for research institutions or SMBs, is proud to offer this service so that data scientists can concentrate their efforts on building AI models and algorithms that contribute to both organizations and society in so many ways.

Unfortunately, the majority of data scientists aren’t spending most of their time building those models – they are cleaning and preparing data. As DJ Patil, former US Chief Data Scientist explains:

“It’s impossible to over stress this: 80% of the work in any data project is in cleaning the data.”

Here are a few of the ways enterprise organizations are applying the vast amounts of high-quality, structured data they receive:

  • Energie NB Power is able to compare historical forecasts in Canada to similar current weather events to predict where the most significant outages will be and mobilize their maintenance crews to proactively respond to these severe weather conditions while minimizing their costs at the same time. The company was also able to restore power to 90% of customers within 24 hours after inclement weather.
  • Transport for London (TfL) implemented a smartcard ticketing system that collected data about passenger journeys, giving them vast information about their customers’ travel habits. In addition to sending personalized travel updates about specific routes to customers, their algorithms are able to help in the case of an emergency. For example, when Putney Bridge closed for repairs, TfL was able to quickly setup an alternative interchange and offer personalized alternative routes communicated through messages.
  • Rolls Royce Holdings, an airplane engine manufacturer, has placed sensors on its propulsion systems to deliver data in real-time to engineers, anticipating and intervening in maintenance problems before they occur.

Here at Webz, one of our customers in the website personalization space has successfully applied our data feeds towards the analysis of more than 150 billion B2B intent signals every month. By applying artificial intelligence to the data feeds, it can identify the most relevant buyers within target accounts using intent monitoring rather than the traditional title of the website visitor. They can then use the data they gather to personalize site experience for visitors according to segment, account and persona to maximize engagement. The ability to better understand the activity of anonymous customers visiting a website in real-time also allows marketers to build more personalized (and ultimately more successful) campaigns targeted to key buyers and focus their advertising budgets on where they can make the most impact.

Sometimes Less Data is More Data

Of course, these organizational giants have a huge amount of data at their fingertips as well as a big advantage in this capacity over smaller to medium-size enterprises (SMEs). But what if you’re a startup or SME looking to develop AI or machine learning applications for your product? Is there any way you can compete with the big guys?

If you’re one of those small businesses, you shouldn’t despair. It’s true that high quality, large datasets are key for training algorithms and fine-tuning their predictive abilities. But for a majority of the AI challenges that exist for businesses, no big datasets are actually available. And even if they are, they might be too expensive for anyone but the biggest companies to obtain.

“For every dataset with one billion entries, there are 1,000 datasets with one million entries, and 1,000,000 datasets with only one thousand entries,” explains Bradley Arsenault, founder and CEO of Electric Brain, a consulting agency that delivers customer AI technology for businesses. In other words, SMEs and startups should think in terms of small data rather than big data. That means that instead of petabytes or more, think gigabytes or even a few terabytes.

The trick is to get creative about the type of data businesses need to collect and to make sure it’s in a format that they can use for the next stage of analysis. That means ensuring that data collected manually is in a unified and structured format, which can involve a tremendous amount of time and effort for businesses without the proper resources.

Here are a few examples of projects that pulled data from Webz’s smaller, yet high quality structured datasets repository for their own AI and machine learning models:

  • A team from the computer science department from the University of North Carolina at Charlotte developed a methodology to identify biometric technology vulnerabilities in addition to the different limitations to identity management.
  • A team from the University of Maryland tracked biodiversity-related keywords in 31 different languages in data across the web in real-time, globally, at a much lower cost than the traditional public-opinion surveys that gauge public support for biodiversity.
  • A scalable alarm verification system was developed that resulted in the ability to verify 30,000 alarms a second with up to 90% accuracy.

Another creative solution for data scientists and SMEs without the same resources, explains Mark Van Rijmenam, big data speaker, strategist, and founder of Datafloq and author of the book Think Bigger: Developing a Successful Big Strategy for Your Business, is to team up with those larger organizations like their suppliers and vendors and combine that data with their smaller datasets or other public datasets. For example, he offers, an organization could combine weather data with restaurant sales data to understand the correlation between rain and items sold and use that data to ensure those food items are available during those rainstorms. Provides High-Quality Data for Enterprise and SMEs Alike

At the end of the day, however, data doesn’t have to be either small or big datasets, it has to be delivered in a way that can be used. Webz’s advanced data filters, capable of collecting granular data, combined with its ability to provide customized datasets, can deliver smaller data sets to SMEs and research institutions.

At the same time, our firehose option delivers structured data to enterprise organizations such as financial companies for the application of predictive analytics to make smarter investment decisions. We take care of the most time-consuming element of data science — collecting, storing and preparing the data — and support your data scientists so that they can make more and better contributions to the world.


Subscribe to our newsletter for more news and updates!

By submitting you agree to's Privacy Policy and further marketing communications.

Ready to Explore Web Data at Scale?

Speak with a data expert to learn more about’s solutions
Create your API account and get instant access to millions of web sources