What to Look for in a News Data Provider

What to Look for in a News Data Provider

For product managers, developers, data scientists, and analysts working in sectors that rely on real-time data insights, news data is critical to product success. Many companies rely on real-time news data for insights about getting the product to market faster, or how to improve what they currently offer. When automated platforms incorporate timely and relevant news data, they can provide deeper and more accurate insights.

This article serves as a practical guide to choosing a news data provider company. It covers key questions to ask when evaluating vendors. It also highlights why news data is critical to business success and includes a few real-world case studies.

If you aren’t satisfied with your current news data solution or need to supplement your current coverage, you can use this guide to find a great data provider.

Why is news data critical to business success?   

News data provides crucial insight to many different types of intelligence software. Consider how timely information can directly impact companies in these industries:

  • Media Intelligence — Media intelligence platforms monitor and analyze real-time news content from around the world to provide accurate insights into brands, customers, competitors, and trends (industry, market, social). News data is critical for media intelligence platforms to track critical changes in consumer behavior, market behavior and assess a brand’s reputation. 
  • Financial Technology – Financial intelligence software companies depend on up-to-date information about current events, trends and business news. All of these things influence stock performance. News Data keeps FinTech clients up to date on major events that could impact a financial entity’s stock price and inform investment decisions. For example, the stock price of an organization might go up after a large acquisition while the company might decrease in productivity if there was a natural disaster near their factory.   
  • Risk intelligence — Risk monitoring solutions, like Exiger, face many types of risk today, from supply chain risk and third-party risk to non-compliance risk and ESG risk. With real-time news data, companies can identify potential risks early, enabling them to implement proactive strategies to mitigate threats, adjust their communications, and protect their reputation.
  • AI/ML Training – News provides real-time, high-frequency insights that reflect current events, market shifts, and evolving public sentiment. Unlike static data sources, news data continually updates, training your product to adapt to new developments and identify emerging trends with accuracy. This constant influx of fresh information enhances the predictive power of AI/ML models, enabling them to deliver insights that are timely but also relevant. 

Now that we’ve covered how news data can provide industries relevant and real-time insights across industries, we’ll discuss what to look for when evaluating news data providers. 

What to look for in a news data provider?

Not all news data providers are equal — news data coverage, quality, enrichment, latency, delivery, and format vary for each vendor. Below are questions to ask when evaluating news data vendors, broken down by quality.

Coverage

  • How many global/local news sources does the provider have access to?
  • How many languages are available?
  • Does the vendor provide historical news data?
    • If so, how far back does it go, and how large is the repository?
  • Can you request additional sources of news data?
  • Is the provider able to legally collect news data from paywalled content?   

Some news data providers (and free news datasets) draw on only a handful of publications. Platforms for media intelligence and fintech require comprehensive coverage to deliver complete data to help those using the data make informed decisions.

Commercial news data products draw from hundreds of thousands of global sources and offer data in multiple languages. High global coverage in multiple languages means automated platforms can capture localized perspectives to understand regional nuances and trends. They can also provide deeper insights into the topics, markets, and industries users care about.

No matter how many sources a provider has access to, you may need them to add new sources of news data. Some vendors will have you submit a request for a new source and then wait for them to complete the request. Other vendors, like Webz.io, offer a much faster self-service solution, allowing you to easily add new sources at the click of a button. 

Another consideration is historical news data — many use cases require real-time and historical news data (like the finance example highlighted above). You may have customers who need access to massive volumes of historical data,  but not all providers have a large news data archive.

Finally, consider whether the provider can access articles behind paywalls while still maintaining compliance. Given the increasing prevalence of paywalled content, you’ll want a provider with a white hat solution for accessing paywalled content that ensures legal compliance. 

Quality

  • Does the provider clean the data, or do we have to do that?
  • Is the data free of noise? 
  • Does the vendor offer solutions with advanced filtering capabilities?
  • Do you have access to the full text or just snippets?
  • Does your data provider offer categories, sentiment and social signals related to the text?

When choosing a data provider, it’s crucial to evaluate the quality of the data offered. Some vendors supply large files of unstructured news data that requires significant cleaning and processing before it can be effectively utilized. For example, many free datasets and some news APIs contain noise—such as boilerplate content, spam data, duplicate content, and raw HTML—that must be removed to obtain usable information.

While some vendors do provide a degree of data processing, it’s important to assess how clean the data truly is. You may still find that you need to invest time and resources to prepare it for analysis. Ideally, you shouldn’t have to spend any time cleaning and processing unstructured data, as it takes valuable time away from achieving your primary goals with the information.

Some providers thoroughly clean and process the data, screening out unwanted content such as text from ads (text bleeding), navigation links, and formatting codes. But even the cleanest data may need additional filtering. Plus, your customers won’t need every bit of news data a provider delivers. 

You should also find a vendor with solutions that include advanced filtering capabilities. Filters allow users to refine their searches, ensuring they get accurate and relevant information. Filters also allow automated platforms to generate deeper insights by screening results based on specific attributes, such as sentiment and social signals.


Delivery and format

  • Is there an API option for receiving the data?
  • Is the news data delivered in structured feeds?
  • What formats are available?

How news data is delivered has a significant impact on the accuracy, speed, and cost of automated platforms. Automated solutions need data delivered fast and in a machine-readable format. The way the data is formatted determines how fast it can be integrated and transformed into insights. Consider the input your product needs and find news data in that format. Otherwise, you will spend more time restructuring the raw data to be compatible with your solution.   

Look for a provider with a news data API that delivers structured news data feeds. Structured feeds are easier for machines to read and understand, and the API delivers the data directly to platforms and applications. Providing high-quality insights in real-time while saving you time and money.

Finally, you should make sure the vendor offers news data in the format you prefer. JSON is the gold standard these days, but you may need other formats, such as XML, RSS, or CSV. 

Timeliness and latency

  • How up to date is the news data?
  • How many news articles are added daily?
  • If the news is delivered via API, what is the latency?

News data is time sensitive as are the insights you can gain from it. Look for a news data vendor with a data frequency that is continuous and in real time. The vendor should add fresh content every day and in massive quantities (Webz.io adds 3.5M+ news articles daily). 

If the vendor has a news API, make sure it has low latency. If you provide an automated platform — like media intelligence, risk intelligence, or financial analysis — you can’t afford to have a news API that takes seconds to respond to requests or returns out-of-date information. You need a news API with fast responses and real-time data so your customers can make timely, data-driven decisions.

Once you’ve thoroughly evaluated different vendors, you can choose the best data provider for your customers’ needs and business goals.

Webz.io’s offering: a model of what to look for in a News API 

Coverage, quality and enrichment are the three things on which to judge the quality of a News API. Let’s break down what they mean. 

Coverage – Webz.io’s News API gives you access to over 300K news sites and 3.5M articles. You know you are working with high quality data when you can incorporate clean, structured, and enriched news data into your product. 

Quality – Capture all the data without worrying about text bleeding, where ads get mixed up with real articles. Some customers talked about their need to go ahead and still structure the data they get from other news APIs -they use our API because it saves them time & costs.

Enrichment – Webz.io’s News API takes raw news data and adds context and insights, making it more useful. Adding social signals, and article sentiment enables businesses to extract meaningful patterns and trends from large amounts of data.

Successful use of news data providers   

With a suitable news data provider, you can ensure your customers achieve their goals. Let’s look at a few Webz.io customer success stories:

Risk management – Exiger

Exiger, the market-leading supply chain and third-party risk AI company, provides AI-powered risk, compliance, and due diligence solutions housed under its platform 1Exiger. DDIQ, the company’s Due Diligence solution, requires massive amounts of diverse, relevant, and timely data, including news data, to deliver accurate risk assessments and user insights.

The Exiger product team integrated the Webz.io News API with DDIQ, accessing news data from millions of sites across the open web. With Webz.io’s news data, DDIQ now accesses approximately 120K relevant news websites, ensuring comprehensive coverage that directly meets their clients’ needs. The variety of news sources from Webz.io allows the company to uncover an ever-growing number of risks so that their customers never miss potential threats that could cause significant damage to their business.

Within a month, the engine surfaced almost 1,000 risks across 1.3 million companies and people based on data from the News API. (quote format)

Read the case study >>

Brand monitoring — Keyhole

Keyhole, a leading social listening and analytics platform (acquired by Muck Rack), wanted to expand its offering from social media monitoring to monitoring the entire online ecosystem. To achieve this goal, the company needed to take a different monitoring approach to web mentions. This was done by tracking mentions across thousands of online sources with laser-focused granularity.

Keyhole, a leading social listening and analytics platform (acquired by Muck Rack), wanted to expand its offering from social media monitoring to monitoring the entire online ecosystem. To achieve this goal, the company needed to take a different monitoring approach to web mentions. This was done by tracking mentions across thousands of online sources with laser-focused granularity.

The company incorporated Webz.io’s data, including news data, into its platform without any hassle. This infusion of data allowed Keyhole’s customers to access an entirely different layer of online conversations. By monitoring web mentions across millions of websites, Keyhole’s users gained a greater awareness of how consumers see their brands. Plus, the ability to deeply filter searches allows customers to get optimal results.

Read the case study >>

Ready to choose a news data provider?

Now you have the critical criteria and questions you need answered to choose the right provider for your news data needs. Your choice can make or break the quality and value of your automated solution’s insights. Ask the right questions and you’ll find the best news data provider for your platform and customers.

Ready to generate deeper and more accurate insights from news data?

SPREAD THE NEWS

Subscribe to our newsletter for more news and updates!

By submitting you agree to Webz.io's Privacy Policy and further marketing communications.
Subscribe to our newsletter for more news and updates!

Ready to Explore Web Data at Scale?

Speak with a data expert to learn more about Webz.io’s solutions
Create your API account and get instant access to millions of web sources