Ran Geva

Web Data

AI Takeover? 4 Big Web Data Predictions for 2024

Explore the top big web data predictions for 2024 and the impact of LLMs on web content.

Web Data

4 Top Web Data Predictions for 2023 and Beyond

Wondering what’s in store for web data in 2023 and beyond? Read this blog post to find out what we expect to happen with web data soon. Hints: ChatGPT and annotations.

Web Data

Webz.io Image Recognition Helps Identify Illicit Content

How Webz.io Uses Image Analysis and Recognition to Identify Illicit Content on the Dark Web Collecting data from the Dark Web is immensely more complex than it is in the open web....

Web Data

The Danger of Fake Reviews

How to Spot Fake Reviews in Time for the Holidays Black Friday is here, and as the biggest shopping day of the year, it means a lot of people will be on...

Web Data

Calling all (almost) Kimono Labs Developers to Migrate to Webz.io

Kimono Labs made an announcement today that it has been acquired by Palantir. Unfortunately Kimono Labs users will only have two weeks to migrate to a different service because the team will...

Web Data

Article’s publication date extractor – an overview

A few days ago I’ve released an open source Python module that provides you with a simple way to extract and normalize the publication date of any online blog or news post....

Web Data

To crawl or not to crawl, that is the question

In order to write an efficient crawler, you must be smart about the content you download. When your crawler downloads an HTML page it uses bandwidth, memory and CPU, not only its...

blog dead simple for devs python crawler script for extracting structured data from any almost website into csv

Web Data

Dead simple {for devs} python crawler (script) for extracting structured data from any website into CSV

On my previous post I wrote about a very basic web crawler I wrote, that can randomly scour the web and mirror/download websites. Today I want to share with you a very simple...

Web Data

Tiny basic multi-threaded web crawler in Python

If you need a simple web crawler that will scour the web for a while to download random site’s content – this code is for you. Usage: [crayon-67434e4d2c3f8649168690/] Where https://cnn.com is your seed site. It could...

Web Data

How we quadrupled the performance of Elasticsearch

Well, that’s a misleading title. We actually quadrupled the performance of our brand monitoring alert system that uses Elasticsearch’s Percolator, but that would have been a much longer title. Some background Buzzilla...

Web Data

Building a Better Search Query

Many factors can affect streaming data relevancy. When the data you consume isn’t ordered by relevancy, rather by the time it was crawled, getting the relevant posts is essential. I would like...

Web Data

Webz.io Tips & Tricks: Content Marketing & SEO

I would like to share with you 2 simple tips about how to leverage Webz.io to promote your website, product or service organically.

Ran Geva

Sort by

Ready to Explore Web Data at Scale?