If you’ve been following recent news in the Big Data world, you’ve probably heard about Apache Flink. This platform for batch and stream processing, which is built on a few significant technical innovations, can become a real game changer and it is starting to compete with existing products like Apache Spark.

In this post, I would like to show how to implement a simple batch processing algorithm using Apache Flink. We will work with a dataset of movie ratings and will produce a distribution of user ratings. In the process, I’ll show few tricks that you can use to improve the performance of your Flink applications.

Continue reading