Skip to content

Tag: batching

Graphs processing with Apache Flink

Graphs are everywhere. Internet, maps, and social networks to name just a few are all examples of massive graphs that contains vast amounts of useful information. Since the size of these networks is growing and processing them become more and more ubiquitous, we need better tools to do the job.

In this article, I’ll describe how we can use Flink Gelly library to process large graphs and will provide the simple example of how we can find a shortest path between two users in the Twitter graph.

Leave a Comment

Calculating movies ratings distribution with Apache Flink

If you’ve been following recent news in the Big Data world, you’ve probably heard about Apache Flink. This platform for batch and stream processing, which is built on a few significant technical innovations, can become a real game changer and it is starting to compete with existing products like Apache Spark.

In this post, I would like to show how to implement a simple batch processing algorithm using Apache Flink. We will work with a dataset of movie ratings and will produce a distribution of user ratings. In the process, I’ll show few tricks that you can use to improve the performance of your Flink applications.

Leave a Comment