A World of High-Quality Accessible News for All

Today if you rely on the internet for your news, you’re probably one of many who refuse to pay for it. And although that puts you in good company, it may also be making you ignorant. For better or worse, many of the top quality publishers, such as the New York Times and the Washington Post, have now started offering subscription models, and some media experts feel that we are quickly moving towards a split in society in which only the elite, who are willing to pay for these subscription models, have access to these high-quality publications.

As Alan Rusbridger, the editor-in-chief of the Guardian for the last 20 years, puts it:

“We are, for the first time in modern history, facing the prospect of how societies would exist without reliable news.”

Why Most News Publications Have Flawed Business Models

The rest of the public, these experts fear, will continue to rely on receiving news coverage from publications whose business model relies on less favorable or highly controversial forms of journalism such as clickbait, personalization, and fake news. This is partly due to the fact that with the rise of Google and Facebook advertising and the free, 24/7 access to online news, the entire online news industry has spiraled into dire financial straits. At the beginning of 2019, almost 1,000 journalists from the largest publications – BuzzFeed, HuffPost and Gannett – lost their jobs. Gannett has reported declining revenue in the last two years; the HuffingtonPost has struggled to turn a profit for years despite generating tens of millions of dollars in advertising revenue.

But although the future of journalism seems grim, not all hope is lost.

Here are a few of the most controversial forms of online journalism and how Webz.io is helping to counter that effect:


Perhaps one of the most popular yet controversial types of news is known as clickbait, based on the psychological principle that the more intensely aligned a headline is towards a positive or negative emotion, the more likely it is to produce a click. It’s also based on the information-gap theory, a theory that humans have a psychological need to receive missing information they feel will eliminate a feeling of “missing out.” Admittedly, Facebook has adjusted its algorithm multiple times to fight clickbait, acknowledging its harmful effects. In 2017 it adjusted the algorithm once again to identify clickbait that fell into one of the two psychological theories mentioned above.

But many online publishers have continued to leverage these two theories and pay journalists based on the number of clicks their articles are able to generate. The problem with that is that clickbait headlines don’t have any correlation at all to high-quality content, and have created a spiral effect where its becoming harder and harder for publications to encourage sophisticated news articles that tackle more complex issues.

DataRobot, a data analytics company, came up with an alternative algorithm for predicting the virality of a news article based on correlations between specific keywords, titles of the articles, external links and specific organizations mentioned in the article using Webz.io’s news datasets. For instance, certain words such as Donald, interview, woman and public were more likely to be found in articles that went viral, whereas words like Washington, second, billion, business, and minister were far less likely. The predictive model DataRobots was able to generate with Webz.io’s data encourages both publishers and journalists alike to focus on writing and publishing high-quality content rather than on generating clickbait titles for monetary gain.

word tag datarobot
Articles with words in red are more likely to go viral; articles with words in blue are less likely to go viral

Personalized News

Less controversial but still having a massive effect on the content users consume is personalized news. Although it has serious advantages for advertising and monetization purposes, continuously consuming personalized content creates an altered perception of reality. The shocking win of Trump in the 2016 US elections is widely accredited to readers’ increasingly personalized online experience and the filter bubble. This creates the effect of an echo chamber in which people largely engage with similar-minded individuals and resist engaging with anyone who might challenge their views, resulting in an increasingly polarized society. The danger in such behavior is that people will not have much opportunity to venture outside their own information silos, unless they make a great effort to do so.

As Elie Pariser, chief executive of Upworthy and author of The Filter Bubble explains:

“Personalization filters serve a kind of invisible autopropaganda, indoctrinating us with our own ideas, amplifying our desire for things that are familiar and leaving us oblivious to the dangers lurking in the dark territory of the unknown.”

In a quest to find a more balanced news approach, researchers at Yale University took curated Webz.io datasets of news articles to build a framework for a personalization algorithm that encourages greater diversification and avoids polarization. They successfully developed a prototype for a news search engine that delivers balanced viewpoints according to flexible user-defined constraints. This study demonstrated that balanced content delivery systems are well within reach.

yale balanced datasets 1
On the right: A more balanced news feed Liberal-leaning articles are in blue, conservative-leaning articles are in red

Fake News

If we combine the psychology of clickbait with the advertising rewards and information silos of personalization, we get another disturbing phenomenon in today’s journalism: Fake news. The concern is that fake news might be responsible for shaping public opinion, like in the US presidential elections of 2016, although Zuckerberg sharply disagrees.

But a Buzzfeed analysis of the top 20 most viral news and fake-news articles before the US elections, for example, found that fake news articles, in particular those favoring conservative points of view, had higher engagement, and a group of researchers at Dartmouth found that a quarter of Americans visited a fake news site in the month before the US election. Feeling a commitment to higher news quality, both Facebook and Google pledged that they would fight against fake news sites from profiting from their ad networks shortly after the 2016 elections.

Other researchers at Southern Methodist University decided to fight the problem at its core by developing an algorithmic model that could predict whether or not a news article was real or fake. Using Webz.io’s News API to find a selection of reliable news datasets, the team created a dataset of both real and fake news articles and developed models to identify fake news using natural language processing (NLP) and reverse plagiarism. The success in developing these news models demonstrated that fake news can be easily identified, and certainly identified more quickly than via human analysis. The project demonstrated that it’s possible that such an algorithm could be built and utilized to rank and score articles on news sites and social media platforms in the future.

Marching Forward in the Battle for Higher-Quality News

Algorithms can be biased and harmful to society, as we’ve discussed before and is shown through the various effects of personalization and clickbait algorithms widely adopted by mainstream online news media and social media sites. But as we’ve shown in several research projects dedicated to alternative and more balanced news algorithms and the identification of fake news, algorithms can also be made to benefit society as a whole. And Webz.io is proud to play a role in that.


Subscribe to our newsletter for more news and updates!

By submitting you agree to Webz.io's Privacy Policy and further marketing communications.

Ready to Explore Web Data at Scale?

Speak with a data expert to learn more about Webz.io’s solutions
Create your API account and get instant access to millions of web sources