Ran Geva

Ran Geva

Sort
blog tiny basic multi threaded web crawler in python
Web Data

Tiny basic multi-threaded web crawler in Python

If you need a simple web crawler that will scour the web for a while to download random site’s content – this code is for you. Usage: [crayon-672a009639cc3530921695/] Where https://cnn.com is your seed site. It could...

percolator side
Web Data

How we quadrupled the performance of Elasticsearch

Well, that’s a misleading title. We actually quadrupled the performance of our brand monitoring alert system that uses Elasticsearch’s Percolator, but that would have been a much longer title. Some background Buzzilla...

query1
Web Data

Building a Better Search Query

Many factors can affect streaming data relevancy. When the data you consume isn’t ordered by relevancy, rather by the time it was crawled, getting the relevant posts is essential. I would like...

reviews1
Web Data

Webz.io Tips & Tricks: Search for Reviews

Are you looking to focus your data search specifically on consumer generated reviews? Here are a couple of simple Webz.io tricks that might help: Limit your query to specific sites You can...

blog crawling horrors computer vision crawlers
Web Data

Crawling Horrors – Computer Vision Crawlers

So if RSS Crawlers are bad, Browser Scraping isn’t efficient, what about computer vision web-page analyzers? This technology uses machine learning and computer vision to extract information from web pages by interpreting...

firefox
Web Data

Crawling Horrors – Browser Scraping

In my previous blog post, I wrote about RSS crawlers, and why they don’t really work. In this post I want to discuss the technique of using a headless browser to parse...

Subscribe to our newsletter for more news and updates!

Ready to Explore Web Data at Scale?

Speak with a data expert to learn more about Webz.io’s solutions
Create your API account and get instant access to millions of web sources