Credits: https://www.pexels.com/photo/close-up-photo-of-programming-of-codes-546819/
Learn about Apache Kafka certifications, their importance, and how to effectively prepare for the CCOAK and CCDAK exams.
What is Apache Kafka?
Many people might be familiar with straightforward queuing systems like RabbitMQ, where a message is produced to the queue and, once consumed, is gone from the queue. Queuing systems are pretty useful, but they lack a fundamental property, which is consistency of data in case the consumer fails to do whatever it's supposed to.
Kafka is a little more complex than that, and its unique structure makes it reliable to store data and then have consumers process it. If you want, there is a very nice article from Conduktor that explains the basics of Kafka.
Kafka is a log-system, meaning that its main job is to store a piece of information into a logical container called a topic (you can think of a topic as a queue, or a database table). One or more producers publish a message (or log) into a Kafka topic, and Kafka ensures that the message is stored consistently and with low latency. One or more consumers subscribe to a Kafka topic and consume the messages. Messages are persisted for a certain amount of time, so a new consumer can re-process all the messages in the topic, or an old consumer can read a message again if they didn't complete their action. For this reason, Kafka is also known as a pub/sub system.
It sort of shifts the architecture design of your application, because while many operations in common frameworks are synchronous (GET a specific page or API endpoint), the use of Kafka moves the needles to an event-driven architecture. Producers will create messages when an event occurs, and consumers will decide when to parse those messages and perform their action (for example, commit a transaction value to your bank balance).
Why Take a Certification
Kafka is becoming increasingly popular, not only for big internet companies to manage the huge amount of data (just imagine how many status updates Facebook might have per second), but also in other industries that use technology extensively: automotive, IoT, banking.
In a normal e-commerce/website, you might come across Kafka in a simple way, producing and consuming messages from a handful of topics. However, getting a certification will allow you to build upon a strong foundation (aka: you will know what you are doing).
And this strong foundation will come in handy when you are facing a problem that is difficult to adress with normal databases, and for which Kafka might be the right solution. Additionally, studying Kafka will give you a little perspective about data management and what is possible nowadays.
Confluent, the most known Kafka Cloud provider, is a company built by the creators of Kafka, and they are heavily involved in the direction and functionalities of the new versions of Kafka (Apache Kafka is open source). They offer the possibility to become a "Certified Administrator" (be able to set up and use Kafka and all its ecosystem) and "Certified Developer" (on top of the first certification, you will know how to produce and consume different types of messages and what to look for when you want to optimize performance).
Confluent Certifications are pretty cheap ($150 each), especially compared to, let's say, even the simplest Oracle MySQL certification that costs more than €200. In addition, you don't need to pay for expensive courses for Confluent; everything can be found online, for free or almost free. It's a pretty good bargain, and it's good for Confluent because a Kafka certified professional is more likely to use their cloud system (it's easy, relatively cheap, and it works out of the box). But nothing forbids you from installing and managing your own Kafka; it's just a matter of trade-offs and money.
Kafka Administrator Certification
Last year, I decided to take those certifications (Administrator and Developer), and my bet (that turned out to be correct) was to try to get both Administrator and Developer certifications back to back.
As a good Nerd Developer, I decided to come up with a plan of what to study and use different resources: books, articles, and videos. So to get a wide range of information and be ready to give the two exams with only a few days apart (Friday for Administrator and Monday for Developer). The whole studying itself took me around three weeks, mostly in the late afternoons and evenings for the first two weeks (in which I was also working full-time) and then the last week 'off' from work, giving a boost to my studying and end up at my best during the exams.
And because studying is already hard, let me share with you the steps I took, the order in which I studied, and what helped me along the way.
A Strong Introduction
The first two resources to get a clue what Kafka is and what it can do are from Confluent itself, and they come in the form of both videos (the YouTube playlist is linked for convenience) and articles:
- Kafka 101 - 18 Lessons, from the amazing Tim Berglund. It's a pretty wide introduction to the main aspects of Kafka: Topics, Consumers, Producers, Kafka Connect, Kafka Streams, ksqlDB.
- Kafka Internals - 15 Lessons, from one of the creators of Kafka Jun Rao. These are probably the most useful lessons for the certifications.
With these two 'courses,' you will be able to work on Kafka in your job and know what Kafka can do. If you are here only to get an introduction, be sure to at least follow these two courses.
Kafka Book
The second resource I recommend is to read the book. Yes, there is a book (written by the people at Confluent), and you can download it for free:
-
Or you can buy it on Amazon for about $50 (or €).
The book is well written and explains concepts in a simple way (trust me, it's not easy to find a well-written book). It is particularly useful because many questions in the exam need you to have read the book (at least 35% of the questions). In fact, some of the most common questions are:
- Producer settings. The most important are probably which values to set up to make a Producer idempotent, or
acks
/min.insync.replicas
configuration (See: https://accu.org/journals/overload/28/159/kozlovski/). - Consumer. It explains how consumers read data from Kafka topics, including different consumer configurations, group management, and offset management to ensure data is processed reliably and efficiently.
- Authentication systems. The book discusses various security features to protect Kafka, including:
- SSL for encrypting data in transit
- SASL for authentication
- Access control lists (ACLs) for authorization
- Topics configuration parameters. How to set up a topic, what partitions and a partition leader are, and how data is processed by Kafka when produced and consumed.
- ZooKeeper and KRaft (See: https://strimzi.io/blog/2024/03/21/kraft-migration/) ZooKeeper's role in maintaining metadata about brokers, topics, and partitions.
- Schema compatibility. Kafka allows the use of JSON-Schema, Avro, and Protobuf to save the 'schema' of your data (think of it as a database table that both producers and consumers can interpret). Any change to the schema (add or remove a field) will inevitably force you to change your code and manage how both newer and older versions of your producers/consumers parse the messages. To avoid getting stuck, you will need to know more about backward and forward compatibility of a schema. Compatibility is not strictly in terms of time, but more in terms of what you are going to update first, your producers (forward compatibility) or your consumers (backward compatibility). For more details, see: https://docs.confluent.io/platform/current/schema-registry/fundamentals/schema-evolution.html and https://www.innoq.com/en/blog/2023/11/schema-evolution-avro/.
Schema Registry
After those main resources, Confluent provides a lot more. The good thing is those resources have also videos and articles. The articles are well written and full of images and code examples, and the videos are available not only on Confluent but also on YouTube as playlists (that can be used on the go when you commute, for example).
Managing a Cluster
- Installing Kafka
- Monitoring: Part 1 - Part 2
- Decommissioning Brokers
- Kafka Storage Internals
- Running Kafka in production
Kafka Performance
- 12 Lessons: https://redpanda.com/guides/kafka-performance
Kafka Security
Kafka Connect
Kafka Streams 101
ksqlDB Introduction
ksqlDB Architecture
Confluent Cloud Networking
Exam Preparation
After having studied all of that, you can finalize your knowledge by looking online for resources that will help you test your knowledge. Of the links I'm suggesting, the cue cards and the practice exams on Udemy are probably the most useful and the closest ones to the real exam.
- Apache Kafka Deep Dive
- CCDAK Exam Notes
- Cue Cards for CCDAK
- Practice Exams on Udemy Link 1 - Link 2
Kafka Developer Certification
As a separate module, you can study the following tutorials in addition to what you already studied for the Admin Certification and take the Developer Certification as well. The delta in knowledge to be able to pass this certification is not much different (or more) than the Admin's one, but it is something that you will need to spend a little time understanding. Especially how to write a simple producer and consumer, how to think in events and event streams, and how to build data pipelines using Kafka connectors. Some of these examples require a running Kafka cluster; luckily Confluent gives you some credits for free along with the tutorials, so you can test your scripts without spending any money (🚨 Remember to turn off your Kafka when you finish the exercises).
Apache Kafka for Python Developers
Kafka Producer and Consumer in NodeJS
- Article: https://www.sohamkamani.com/nodejs/working-with-kafka/
- Code: https://github.com/sohamkamani/nodejs-kafka-example
Event Driven vs State Based
Designing Events and Event Streams
Data Pipelines with Apache Kafka
Write a Source Connector
Additional Resources
How to take the Exam(s)
Taking the exam itself is probably the least enjoyable part. And not only because it is an exam, but because the way the whole process feels is really convoluted:
- You register on the Confluent Training Portal and choose which certification you want to take the exam for: https://training.confluent.io/content/certifications
- You pay for the exam and get a sort of voucher that you have some time to use. That voucher will be redeemed to book the exam.
- Once you redeem the voucher, you will be redirected to another website where you can actually book the exam itself.
- 24 hours before the exam, you will need to do a check of your system to see that your connection is stable, your camera works, and so on.
- The day of the exam, you will first do a small 5-minute training that teaches you how to select the answer and go through the exam.
- Once everything is ready, you will be first verified: you will have to show a document/passport and that you are in a quiet room. Also, they will ask you to show the table (no paper or pen) and maybe look around the room to be sure nobody is interfering with you.
- The exam is around 60-90 minutes, and it's not that difficult. But to be honest, not having even a piece of paper to write felt a little too much.
- An advice on how to give your answers, is to do a first pass and answer only the questions you are sure about. Maybe mark the difficult questions.
- Then do a second pass to take the time with the more difficult questions (no negative points for answering wrong).
- And then a third pass to re-read your answers (just in case - I caught a couple of errors doing this).
- The score to pass is something around 65-70%. As soon as you submit, you will be told if you passed or not, but it will take a while before you get the email that says so (so don't freak out).
- And then within 24 hours, you will get the Accredible link that you can share on LinkedIn or any other platform (you can also download the certification in PDF).
Conclusions
As I said at the beginning, Kafka is becoming a very popular technology, and for an IT professional, having one or both of the certifications is surely an added value. Studying is not that hard (most of the videos can be watched in your spare time), and even though I said it took me three weeks to do, it is also true that in the six months prior I did read and watch tutorials about Kafka, so I wasn't completely in the dark. And, to be honest, it takes time (and some practice) to get familiar with some of the Kafka concepts, so don't rush. In addition, the cost of taking this certification by yourself is not that high, a hundred bucks for the printed book and the Udemy course. But all the rest is very cheap or even free.
I hope you will take your first steps into Kafka, learn something new, and maybe even find new job opportunities. In the meanwhile, I'm trying to study for both a TypeScript and AI fundamentals courses, and maybe I'll write about them soon. Stay tuned!
Published: Monday, Jun 10, 2024, 03:41 PM - Updated: Sunday, Oct 13, 2024, 07:04 PM