Skip to content

Category: For Syndication

Only DZone is allowed to share posts that are obtained from this RSS: https://brewing.codes/category/for-syndication/feed

Distributed graphs processing with Pregel

Graphs processing is an important part of data analysis in many domains. But graphs processing is tricky may be tricky since general purpose distributed computing tools are not suited for graphs processing.

It is not surprising that an important advancement in the area of distributed graphs processing came from Google that has to process one of the biggest graphs: the Webgraph. Engineers in Google wrote a seminal paper where they described a new system for distributed graphs processing they called Pregel.

In this article, I will explain how Pregel works, and demonstrate how to implement algorithms using Pregel using API from Apache Flink.

Leave a Comment

Graphs processing with Apache Flink

Graphs are everywhere. Internet, maps, and social networks to name just a few are all examples of massive graphs that contains vast amounts of useful information. Since the size of these networks is growing and processing them become more and more ubiquitous, we need better tools to do the job.

In this article, I’ll describe how we can use Flink Gelly library to process large graphs and will provide the simple example of how we can find a shortest path between two users in the Twitter graph.

Leave a Comment

Implementing Flink batch data connector

Apache Flink has a versatile set of connectors for externals data sources. It can read and write data from databases, local and distributed file systems. However, sometimes what Flink provides is not enough, and we need to read some uncommon data format.

In this article, I will show you how to implement a custom connector for reading a dataset in Flink.

Leave a Comment

Using Apache Flink with Java 8

JDK 8 has introduced a lot of long-anticipated features to Java language. Among those, the most notable was the introduction of lambda functions. They allowed adding new frameworks such as Java 8 Streams, as well as, new features to existing frameworks like JUnit 5.

Apache Flink also supports lambda functions, and in this post, I’ll show how to enable them and how to use them in your applications.

Leave a Comment

Calculating movies ratings distribution with Apache Flink

If you’ve been following recent news in the Big Data world, you’ve probably heard about Apache Flink. This platform for batch and stream processing, which is built on a few significant technical innovations, can become a real game changer and it is starting to compete with existing products like Apache Spark.

In this post, I would like to show how to implement a simple batch processing algorithm using Apache Flink. We will work with a dataset of movie ratings and will produce a distribution of user ratings. In the process, I’ll show few tricks that you can use to improve the performance of your Flink applications.

Leave a Comment

Apache Flink: A New Landmark on the Big Data Landscape

There is no shortage of Big Data applications and frameworks nowadays, and sometimes it may even seem that all niches have already been filled. That’s not how creators of Apache Flink see it, though.Even though their project is not yet as well known as Spark or Hadoop, it has brought enough innovations to become a real game-changer in the world of Big Data.

In this article, I would like to introduce Apache Flink, describe what its main features are, and why is it different from other available solutions. I’ll end the article with an example of a simple stream processing application using Flink.

Leave a Comment

Generators in Python

In previous articles I’ve wrote about how to create an iterator in Python by implementing iterator protocolor using the yield keyword. In this article I’ll describe generators: a piece of Python syntax that can turn many iterators into one-liners.

Leave a Comment

Anatomy of a Python Iterator

Iterator is a powerful pattern that was recognised at least as early as 1994 and since then it was incorporated in syntax of almost every modern programming language.

Python also implements this pattern providing a pithy and concise syntax to iterate over lists, maps, dictionaries and other data structures:

for i in [1, 2, 3, 4]:
    print i

In this article I will write about how an iterator is used in Python, how to implement your own iterator and what types of iterators exist in Python.

Leave a Comment