What is Apache Kafka?

Apache Kafka is an open source streaming platform that is used to manage the exchange of data between systems and to process data in real time.

Apache Kafka (1) (2)

Kafka was originally built by LinkedIn to solve the challenges it had surrounding its enormous quantities of data in motion. It later became the company Confluent, which delivers inter alia Kafka as a cloud service and is responsible for the majority of the contributions to the Apache Kafka project. Itera is Confluent’s first partner in Norway.

Imagine that you or your organisation are required to report information in connection with a new directive that applies to you. Your data is spread across countless different systems and needs to be combined, understood and presented in report form – and this needs to be done regularly. Currently, this would mean involving IT and other departments in order to find the data that needs analysing and so on… the last time you did something similar, it took weeks of hard work.

 

From 16 weeks to a few seconds

This was the challenge a large Nordic bank was facing when it decided to adopt the Apache Kafka streaming platform. By combining data from numerous systems into a single platform, the bank was able to reduce the time it took to get the base data and report in place from 16 weeks to a few seconds. In addition, the bank now also has a better overview of its data and can use it in new ways to analyse and manage its activities – it has taken the first steps towards becoming a data-driven organisation!

We have set out below how the Apache Kafka streaming platform actually works and what makes it unique compared with other streaming platforms.

Today’s data systems are often complex and convoluted:

Itera Apache Kafka modell 01 uten tittel.jpg

All this can be made much simpler with Apache Kafka:

Itera Apache Kafka modell 02 uten tittel.jpg

Apache Kafka is of relevance to all organisations that want to make better use of their data

The example of the Nordic bank is one of many possible examples. Data is growing exponentially, and we are already starting to see the outlines of a society driven by everything becoming a source of data – from cars to our bodies. This also applies to organisations and businesses.

While this is creating a range of opportunities, it will also bring challenges if we continue to store and manage data as we do today.

For the majority of organisations, data is currently something stored in systems that do not talk to one other. What an organisation really needs is a single, shared data source: a central nervous system that can be relied upon that makes all the organisation’s data and records available to everyone. This would enable an organisation to react as it gradually gathered more data, and it would make its current unavailable systems more manageable and available.

Most organisations are facing an explosion in terms of the amount of data generated by new applications, new business opportunities, the IoT and the like. For most people, the ideal architecture is a clean, optimised system that makes it possible for businesses to capitalise on all their data.

The traditional systems that were used to solve these problems were designed at a time of large distributed systems, and they lack the ability to scale to meet the needs of the modern data-driven organisation.

 

What actually is Apache Kafka?

Apache Kafka is a steaming platform designed to solve the challenge of having large quantities of data in a modern distributed architecture. It was originally developed as a quick and scalable distributed messaging queue, but it quickly expanded to become a complete streaming platform not only for publishing or subscribing to data, but also for storing and processing data in real time.

Apache Kafka is what is known as a distributed streaming platform, and it has three key capabilities:

  1. Publishing and subscribing to streams of data or records, similar to a message queue or messaging system.
  2. Storing streams of data/records securely without the risk of data loss.
  3. Processing streams of records as they occur - in real-time.

Apache Kafka is generally used in two main areas:

  1. Building real-time streaming data pipelines that guarantee the right distribution of data across systems and applications.
  2. Building real-time streaming applications that transform or react to the streams of data.

The platform was first developed by LinkedIn in 2011 and was open-sourced in 2012. Since then Apache Kafka has been developed from a simple message queue into a complete streaming platform that processes trillions of records a day for users across multiple industries. Thousands of companies have built their technology and data processing on Apache Kafka, including Netflix, LinkedIn and Airbnb.

 

Itera – Confluent’s first partner in Norway

In the spring of 2018 Itera entered into a partner agreement with Confluent. Confluent has its origins in the group that developed Kafka at LinkedIn, and it is today the largest contributor to the open source project. Confluent has also set up an ecosystem of solutions around Kafka that are offered in an enterprise package or as SaaS solutions by the cloud providers Amazon, Google and Azure.

Itera is Confluent’s first official partner in Norway and the preferred provider of the Apache Kafka Streaming platform. Our partnership gives us access to resources and expertise that benefit our customers. We also share experience from our projects with Confluent and its other partners to build a strong international network of providers around Apache Kafka.

 

Contact

Interested in learning more about Apache Kafka? Our specialists are always available to tell you more about what we can do for you and your organisation.

We would be very pleased to hear from you!