How to Avoid Spam in News API?

How to Avoid Spam in News API?

Organizations across industries can benefit from using a News API, using news data for various purposes such as staying on top of market trends, keeping track of competitors, and building news-driven customer applications. However, many news APIs include spam in the responses. Reducing spam in news API results is critical for businesses because spam leads to many problems, including inaccurate news analysis, poor user experience, and potential security threats. For the best results, news API users need to reduce the amount of spam in responses as much as possible. This article highlights ways you can reduce spam when using a news API.

Why do some news APIs return more spam than others?

The information returned by news APIs varies widely in quality, size, and scope. Some news APIs provide adequate amounts of news data but with far too much spam content. Other APIs provide massive volumes of global news information but with minimal spam. The reasons a News API may return spam in the results vary and can include:  

  • Lack of effective spam filtering mechanisms — Many news APIs scrape content from various sources across the open web without having mechanisms to filter out spam effectively. Without spam filtering, low-quality and spam content can slip through and become part of the API’s responses.
  • Sophisticated spam tactics — Some spammers inject low-quality news content or spam links into sources, which get picked up by some news APIs. Others use techniques like keyword stuffing and creating synthetic content to get spam news content ranked higher on search engines and news aggregators. Many news APIs look at rank when including news content in responses.
  • Emphasis on monetization — Most news sites earn money primarily through advertising and sponsored content. Some platforms may post lesser-quality news content to increase site traffic, generating more revenue. Some news sites don’t mark sponsored posts, making it unclear which content is authentic or sponsored. News APIs scrape this content, reducing the quality of the news data they provide.
  • Lack of human moderation — While automated systems can help reduce the amount of spam posted to news sites, human moderation can help reduce spam further. Human reviewers can catch nuances in content that automated systems can’t. Some news sites have little to no human moderation in place, resulting in more spam in their content. Many news sites don’t have mechanisms for filtering out nuanced news spam.

At Webz.io, we take several steps to reduce the amount of spam in the results of our News API and News API Lite. The News API is our feature-packed, commercial news API and News API Lite is the limited community version of our News API.

How Webz.io reduces spam in its News API results

We constantly strive to balance providing broad coverage and mitigating spam news content. From the smallest independent news blogs to the largest enterprise news media sites, we use multiple methods to reduce news spam, which include:

  • Factoring in domain rank — Domain rank indicates the popularity of a website. A site with a rank of 1,000 is less popular than one with a rank of 100. We limit the amount of data News API users can download from sites without a domain rank. 
  • Filtering out spam farms — We identify HTML patterns around spam farms that we detect and filter out. A spam farm or link farm is a group of interconnected websites created to manipulate search engine rankings. Spammers use deceptive tactics to increase the rankings of these spam websites.
  • AI-powered analysis — We leverage AI to analyze the structure of each news article to see if it “looks” like a news article as opposed to other content like Q&A, discussion, or eCommerce.
  • Manual quality assurance — We also have a manual quality assurance process to ensure the quality of news content our News API returns.

While we take steps to reduce spam in the News API results, users can take steps to reduce spam further.

5 ways Webz.io News API users can further filter out spam

If you’re using the Webz.io News API or News API Lite, you can further filter out spam by doing one or more of the following:

#1 Search top news sites

Use site_category:top_news to search only the top news sites. We manually curated the top news sites for each country, and by utilizing this filter, you can limit the results to high-quality news sites (short tail). 

Example: site_category:top_news (“OpenAI” OR “ChatGPT” OR “LLM”) language:english 

This query will search for news in English about OpenAI, ChatGPT, or large language models (LLM) in the top news sites.

Example of results from a query for news in English about OpenAI, ChatGPT, or large language models (LLM) in the top news sites.

#2 Leverage the performance score filter

Use the performance score filter to search for news articles based on their popularity among social media sites, such as Facebook, VK, and LinkedIn. Popularity is based on likes, comments, and shares, with the score ranging from 0-10. A score of 0 means that the post wasn’t shared much and didn’t do well. A score of 10 means the post was shared a lot and essentially “on fire.”

Example:(“Samsung” OR “Samsung Galaxy”) language:english performance_score:>8  

This query will search for English news articles mentioning Samsung or Samsung Galaxy in the title where the article received a performance score of 8 or higher.

Example of results from a query for English news articles mentioning Samsung or Samsung Galaxy in the title where the article received a performance score of 8 or higher.

#3 Apply categories

Use categories to pinpoint topics of interest.

Example: category:health (“Biden” OR “Affordable Care Act” OR “Obamacare”) language:english 

This query will return English news articles in the health category mentioning President Biden or The Affordable Care Act (also referred to as Obamacare).

Examples of results from a query for English news articles in the health category mentioning President Biden or The Affordable Care Act (also referred to as Obamacare).

#4 Leverage sentiment filter

Use the sentiment filter to look for adverse media (negative news).

Example: sentiment:negative (“Intel” OR “Layoff”) language:english 

This query will return negative English articles mentioning Intel.

Examples of results for a query for negative English articles mentioning Intel.

#5 Search domains by rank

Use domain_rank to search top domains by popularity. 

Example: domain_rank:<10000 site_type:news language:english (“Sam Altman” OR “OpenAI”)

This query will return English news articles mentioning Sam Altman or OpenAI from the top 10,000 sites worldwide. We recommend setting the value between 10K and 100K as smaller values limit the results too much.

Examples from a query for English news articles mentioning Sam Altman or OpenAI from the top 10,000 sites worldwide.

General tips for reducing spam in News API results

The best way to obtain less spammy news content is to use our News API or News API Lite! However, if you’re using a news API from a different provider, here are a few basic tips on how to reduce spam in the results:

  • Use advanced search options — Many news APIs include multiple search options, allowing you to search for news based on items like category, language, location, keyword, or tag. Use specific keywords and categories instead of broad terms to ensure the API returns relevant content. Use multiple search options to narrow down the news content returned by the API.
  • Leverage advanced filtering techniques — Some APIs include advanced filtering options, allowing you to filter the news data based on components such as news source, domain rank, date range, and sentiment. Spammy content tends to lack domain authority and be out of date. Filter spam out by focusing on recent news and site popularity. 
  • Create blacklists and exclusions — Some news APIs let you create blacklists for spammy words and phrases or let you exclude domains known for generating spam news content. You can define what words and phrases are considered spam and prevent them from inclusion in the API’s responses.
  • Report spam — If available, leverage the API’s “report spam” mechanism to report low-quality and spam content to the API provider. Reporting spam helps the provider improve the quality of the content returned by the news API.

If the news API you’re using doesn’t have advanced filtering options, you could use natural language processing (NLP) to create your own filters. However, it would be easier and less time consuming to switch to a news API with essential filters, including ones for reducing spam. 

Choose a full-featured News API

As you can see, many methods are available to reduce spam in the results of a news API. The best way to avoid news spam is to choose a news API with access to high-quality, comprehensive, global news data and advanced search and filtering options. The more built-in options a news API has, the better you can reduce news spam. 

Interested in learning more about how to avoid spam in news API results? Talk to one of our experts.

SPREAD THE NEWS

Subscribe to our newsletter for more news and updates!

By submitting you agree to Webz.io's Privacy Policy and further marketing communications.
Subscribe to our newsletter for more news and updates!

Ready to Explore Web Data at Scale?

Speak with a data expert to learn more about Webz.io’s solutions
Create your API account and get instant access to millions of web sources