Skip to content

Month: February 2017

Distributed graphs processing with Pregel

Graphs processing is an important part of data analysis in many domains. But graphs processing is tricky may be tricky since general purpose distributed computing tools are not suited for graphs processing.

It is not surprising that an important advancement in the area of distributed graphs processing came from Google that has to process one of the biggest graphs: the Webgraph. Engineers in Google wrote a seminal paper where they described a new system for distributed graphs processing they called Pregel.

In this article, I will explain how Pregel works, and demonstrate how to implement algorithms using Pregel using API from Apache Flink.

Leave a Comment

Declutter Your POJOs with Lombok

I wrote this article for SitePoint’s Java channel, where you can find a lot of interesting articles about our programming language. Check it out!

I have a love/hate relationship with Java.
On one hand, it’s a mature programming language with a diverse number of frameworks and libraries that make development relatively easy.
On the other hand, it’s very verbose and requires writing massive amounts of boilerplate code for common tasks.
The situation got better with the introduction of lambdas and streams in Java 8, but it is still sub-par in some areas, like writing plain old Java objects POJO.
In this post, I’ll show you how to rewrite POJOs in only a few lines of code with Lombok.

Leave a Comment

Graphs processing with Apache Flink

Graphs are everywhere. Internet, maps, and social networks to name just a few are all examples of massive graphs that contains vast amounts of useful information. Since the size of these networks is growing and processing them become more and more ubiquitous, we need better tools to do the job.

In this article, I’ll describe how we can use Flink Gelly library to process large graphs and will provide the simple example of how we can find a shortest path between two users in the Twitter graph.

Leave a Comment

Implementing Flink batch data connector

Apache Flink has a versatile set of connectors for externals data sources. It can read and write data from databases, local and distributed file systems. However, sometimes what Flink provides is not enough, and we need to read some uncommon data format.

In this article, I will show you how to implement a custom connector for reading a dataset in Flink.

Leave a Comment