TrustRadius Insights for Apache Kafka are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Pros
Fault tolerance and high scalability: Users have consistently praised Apache Kafka for its fault tolerance and high scalability. Many reviewers have stated that Kafka excels in handling large volumes of data and is considered a workhorse in data streaming.
Ease of administration: Reviewers appreciate Kafka's ease of administration, noting that it offers an abundance of options for managing and maintaining queues. Multiple users have mentioned that the platform allows for easy expansion and configuration of cluster growth, making it straightforward to administer.
Real-time streaming capabilities: Kafka's real-time streaming capabilities are seen as a significant advantage by users. Several reviewers have highlighted the platform's ability to handle real-time data pipelines and its resistance to node failure within the cluster. This feature enables users to process asynchronous data efficiently and ensures continuous availability of the system.
Apache Kafka is really the bedrock of all things streaming and data processing. I cannot imagine if there is any other product that does it better. My last 2 companies used it, and my current one does so as well. If you want your data stream to be organized and sent, Apache Kafka has become the tool of choice. I have dabbled in Azure EventHubs as well, if you are into opensource data streaming, Apache Kafka will take you where you need to be for data lakes and the amount of data that is streamed for the cybersecurity industry that my company is in. Without Apache Kafka, there is no way that my company products can handle the volume of data that we process for our customers.
Pros
Data streaming is really second to none.
Scaling, done right, Apache Kafka is a workhorse.
Ease of administration - Although you cannot really compare to Azure EventHubs, but that is comparing between Apples and Oranges.
Cons
The web UI has not really changed in years. UX has been refreshed, but a more streamlined UX instead of many 3rd party webUX tools, will be most welcome.
Webhooks can still be tricky to troubleshoot at times.
CLI monitoring is a learning curve to get it right.
Likelihood to Recommend
Apache Kafka is well-suited for most data-streaming use cases. Amazon Kinesis and Azure EventHubs, unless you have a specific use case where using those cloud PaAS for your data lakes, once set up well, Apache Kafka will take care of everything else in the background. Azure EventHubs, is good for cross-cloud use cases, and Amazon Kinesis - I have no real-world experience. But I believe it is the same.
We use Apache Kafka as message broker between our two client facing applications. We used ActiveMQ before but it had shortfalls of high availability and clustering. Kafka solved it on both fronts and gives a good business continuity.
Pros
High availability
performance
Admin user interface
Cons
zookeeper logs could be better
monitoring
Likelihood to Recommend
It is well suited if you want to use a message broker between two applications with high availability. Its also can be used as streaming replication for data.
VU
Verified User
Employee in Information Technology (Computer Software company, 51-200 employees)
We use Apache Kafka as an event bus for all our async activities & Micro Service Communication, like sending emails, SMS, and notifications between services and consumers and for event & data processing.
Pros
Event driven architectures
Any use case which requires async data processing
Any use case with production and consuming the same data to build business-specific processing
Cons
Zookeeper services configuration can be simplified
Data logging needs to be secured
Restarting & overall management needs to be improved
Likelihood to Recommend
- It's Super fast - Has some learning curve but once mastered it brings scale - All logics that need producer & consumer kind of implementation (Bulk Notification, etc) - Event-driven architectures can be implemented with Apache Kafka
Apache Kafka is the most powerful and scalable streaming framework on the market. We have used Apache Kafka as a part of many real-time analytics solutions. It has a great performance [and is] easy to integrate with big data technologies like Spark. Due to its distributed nature, Apache Kafka is capable of operating very quickly and can handle millions of messages every second.
Pros
Real time streaming
Performance
Scalability
Cons
Management tools
Likelihood to Recommend
I have used Apache Kafka for real-time analytics and streaming. It’s highly scalable and integrates well with big data technologies like Spark. I believe Apache Kafka is the best in the market.
VU
Verified User
Engineer in Information Technology (Telecommunications company, 10,001+ employees)
We use Kafka as the queuing mechanism for records in an indexing pipeline. Previous to using Kafka we were working with tables in SQL Server to handle a queue in a situation that SQL is not really designed for. Kafka provides a simple and efficient system that does the job it was intended for, queuing and maintaining records in a queue, and works very well. We use Kafka for several processes in our organization that require records to be stored and be processed by dedicated servers.
Pros
Queuing of records
Easy expansion of Topic parititions
An abundance of options for managing and maintaining queues
Easy expansion of cluster for growth
Cons
A management interface would be nice
Built in logging tools
Likelihood to Recommend
Kafka is a queuing system, plain and simple, and it does its job efficiently and with little fuss. We utilize Splunk logging to keep track of records in queues and how items are being processed and outside of that we generally do not have to mess with Kafka, it just does the job with little maintenance or problems. Any situation where records or information need to be placed in a queue to be accessed and processed by other systems would be well suited to scenarios where Kafka is the right solution.
It is being used for the product mainly. We have huge data pipelines running which depend on Apache Kafka. It is being used for more than 5 years now and we are really happy with the performance and the reliability Apache Kafka has to offer. The experience has been excellent.
Pros
Data Pipeline
Asynchronous processing
Data retention for reprocessing
Cons
Dashboards to monitor the performance
ZooKeeper free
Connectors for more languages
Likelihood to Recommend
It works overall really well for maintaining data and then processing whenever you want to as it has really good retention options. Multiple consumers can be run and systems can be scaled.
Works well when scale is needed
Can work well on low hardware requirements
Where it can be limiting is while implementing priority queues as it has to be done at the producer level.
VU
Verified User
Vice-President in Engineering (Internet company, 201-500 employees)
Kafka is being used for sending log information in real time and there[fore] can monitor apps and send these events to feed other apps. It's the core for send[ing] and receiv[ing] messages due to quantity of messages per second. Helps us to scale and manage the common errors in this type of problem.
Pros
Scalable
Fast
Performance
Open source
Cons
Performance security
Monitoring
Configuration
Likelihood to Recommend
Send a few events in a few time slots: Kafka is designed for high computing events. If you application doesn't work with more [than] 25.000 messages, Kafka isn't the correct solution.
Send events with high size: don't try working with events with more [than] 1 Mb, the performance is very poor.
Send event without compression: if you work with any compression with messages this will help the performance in net traffic and speed of pipeline
VU
Verified User
Engineer in Information Technology (Computer Software company, 1001-5000 employees)
We used it for event logging. It was used for application log collection. Was used with exception tracking and with core microservices of the web application. It helped us reduce cost and simplified operational monitoring.
Pros
It handles large amount of data simultaneously. Makes application scalable.
It is able to handle real time data pipeline.
Resistant to node failure within the cluster.
Cons
Does not have complete set of monitoring tools.
It does not support wild card topic selection.
Brokers and consumer pattern reduces the performance.
Likelihood to Recommend
It works well as a replacement for traditional message broker. Used when you want to log simultaneously tracking multiple web activities.
We use Kafka for two key features: (1) keeping a buffer of all the incoming records that need to be stored in our data infrastructure, and (2) having a way to replay messages in case our data infrastructure loses some data. The reason we need to buffer is that when our traffic spikes, we can have up to 1 million messages coming in that need to be processed in some form or fashion. To expect the back-end service to support that is crazy. Instead, we dump them into Kafka to give our data infrastructure time to ingest them. As for replaying events, sometimes the ingestion pipeline fails and drops some messages. I know - that's a huge mistake on our engineering team's part - but when it does happen Kafka has the ability to rewind and replay messages, resulting in delayed processing but no data loss.
Pros
Really easy to configure. I've used other message brokers such as RabbitMQ and compared to them, Kafka's configurations are very easy to understand and tweak.
Very scalable: easily configured to run on multiple nodes allowing for ease of parallelism (assuming your queues/topics don't have to be consumed in the exact same order the messages were delivered)
Not exactly a feature, but I trust Kafka will be around for at least another decade because active development has continued to be strong and there's a lot of financial backing from Confluent and LinkedIn, and probably many other companies who are using it (which, anecdotally, is many).
Cons
Doesn't work well with many small topics (on the order of thousands). There is a physical limit due to file handler usage on the number of topics Kafka can have before it grinds to a halt. This is not an issue for most people but it became an issue for us, as we need to have many, many topics and so we weren't able to fully migrate to Kafka except for a few of our big queues.
Lack of tenant isolation: if a partition on one node starts to lag on consume or publish, then all the partitions on that node will start to lag. That's what we've noticed and it's really frustrating to our customers that another customer's bad data affects them as well.
I don't have tooo much experience here, but I hear from other engineers on my team that the CLI admin tool is a real pain to use. For example, they say the arguments have no clear naming convention so they are hard to memorize and sometime you have to pass in undocumented properties.
Likelihood to Recommend
Despite the disadvantages I list, I really believe that Kafka is the right choice whenever you need a queueing or message broker system. Kafka is way too battle-tested and scales too well to ever not consider it. The only exception is if your use case requires many, many small topics. Also, Kafka doesn't support delay queues out of the box and so you will need to "hack" it through special code on the consumer side.
VU
Verified User
Engineer in Engineering (Internet company, 201-500 employees)