Why Extracting Content From The Open Web Is Better than Surveys for Research

What’s the best way to find out how people feel about a given topic? Simply ask them, right? Well, at least that’s what we’ve been led to believe.

Standard polling practice tells us that if you put together some questions, pose them to a group of people and then “normalize” the data to account for outliers, you’ll end up with research that you can bank on. While it wouldn’t be fair to assert that this method doesn’t work at all, especially given that it’s gotten us this far, today there are better research methods to consider as well.

Over time, the evolution of technology has led to great leaps in how we conduct research and how accurate our raw data is. Today, the most effective way to know how people feel about the hard news issues of the day, UFOs, Rihanna, trends in digital marketing and even your own brand involves asking zero questions. Today it’s all about listening.

Give people enough breathing room, and they will voluntarily reveal their take. Picking up on these digital signals is hands-down the best way to know about people’s interests, opinions and behaviors.

Let’s take a look at some of the common biases that render survey data problematic, and then we’ll delve into some of the key reasons why content extracted from the open web gives today’s leading researchers a more accurate picture of how the masses truly feel.

Questions that Necessarily Skew the Answers

The second you formulate what you need to know as a question, you’re basically “leading the witness” to respond in a certain way. It’s that simple, and it’s 100% unavoidable. Sure, there are tactics you can employ to minimize “response bias,” as the phenomenon is called, but even the minimum is unacceptably high when compared to data collected without questions at all.

Who are the people you’ll be targeting with your survey? How and where do you intend to reach them? What kind of experiences do these people expect from this setting, and how is your survey influenced by those expectations? What kind of emotional baggage does this context summon?

In this sense, response bias is a flaw that’s baked into the entire research project. This has little to do with analysis methods. Response bias dooms the data starting with the design phase – before any questions are even asked. Indeed, how the researcher views the project overall unavoidably influences the questions asked and therefore the range of answers he or she will receive in response.

The Call of the Herd

Humans are also socially conscious. Survey respondents are known to answer questions in ways that they think will be socially desirable – even when their answers are directly antithetical to their actual feelings.

This is because people are swayed by what we think the next person is going to say. People wonder how the pollsters are going to react. They also consider whoever else might have access to the survey data.

This means that no matter how much you insist that your survey is anonymous, because of this “social bias,” your respondents are going to compromise the honesty of their responses. They’re going to focus more on providing herd-compatible answers than providing straight answers.

Samples that Don’t Represent the Whole

Let’s say that everyone who responds to your survey answers in a manner that truly reflects how they feel, which, as discussed above, is a long shot. There’s still a decent chance that the survey data won’t be accurate, simply because not everyone’s opinion reflects the wider populace’s feelings. This is what’s known as “selection bias.”

Even if you do manage to take a truly random sample, your sample could consist of outliers – any time you operate under the belief that part of a group shares opinions with the whole group, you’re likely to run into accuracy problems.

Sure, the larger your sample, the more likely it will be representative, and depending on whether your study is quantitative or qualitative, your sample might or might not provide the insights you seek. Unfortunately, most researchers go for the statistically valid minimum sample size, or less, although it’s also important to remember that even larger samples can misrepresent the whole.

And there are plenty of situations where the format of the survey ends up causing a selection bias. For example, how many questions are in your survey? Respondents are less likely to answer if they feel like you are asking too much of them, which can throw off the extent to which the sample represents the whole. It’s not uncommon for respondents to jump ship as soon as they realize that your survey will take more than a few seconds of their time or that it will require them to share their feelings. If you try to compensate by introducing incentives, this simply becomes an additional sample selection bias factor.

Better Measurement via Data Extraction

Do you know where the audience whose opinions you want to assess is right now? That’s right – they’re online. The developed world spends hours on end every day sharing content on social media, commenting on articles, chiming in on forum discussions and even blogging themselves.

Interacting with and creating so much content leaves a digital footprint, one that savvy researchers can find and measure. Using a powerful web crawling platform that extracts the relevant data and structures all of these discussions, you can even receive new relevant content on an ongoing basis via a custom RSS feed.

With Webz.io, you can also download discussions as an Excel spreadsheet, so you are free to process it however you like for your own purposes – counting instances of keywords, assessing sentiment and correlating any number of factors with people’s opinions. This handy feature is not supported by most sentiment monitoring tools, as they aren’t made to be useful to serious researchers.

Content extracted from the web helps you avoid the research biases associated with polls and surveys. When you can extract everything you need, then there’s no sample involved, so you’ll never run into sample size issues or any other sampling biases. Content extracted from the web, moreover, doesn’t pose any questions of people, so response biases likewise don’t come into play.

Just listen – the crowd is revealing all, answering every question you can think of.

Carrying out research using online content works better because you can access all of the content posted by people who choose to share their feelings and opinions about your topic of interest. It’s akin to what researchers call “voluntary response,” because these people all choose to share their opinions – except in this case, they don’t even know you’re paying attention. In this sense, it’s the perfect “finger on the pulse” method.

Online Content for Research

For many generations, when it came to research methods, polls and surveys were all we had access to. There was plenty of awareness about response biases, selection biases and other factors that skewed results, but there wasn’t much researchers could do about these flaws, except to design studies in a manner that would minimize accuracy compromises.

Today, however, we have choices, and there is much to be said for the superiority of content extracted from the open web. No sample sizes, no questions asked – crawled data is simply a better way to reveal the truth of public opinion.

To get started experimenting with online web content for your research, simply sign up for a 10-day free trial or talk to an expert to learn more about Webz’s services.


Subscribe to our newsletter for more news and updates!

By submitting you agree to Webz.io's Privacy Policy and further marketing communications.

Ready to Explore Web Data at Scale?

Speak with a data expert to learn more about Webz.io’s solutions
Create your API account and get instant access to millions of web sources