TrustRadius Insights for IBM StreamSets are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Business Problems Solved
Users have found Streamsets to be a versatile and user-friendly platform that solves a variety of data integration challenges. One key use case is the ability to easily develop on-premises and deploy to the cloud, helping users control their cloud budget efficiently. The platform has also been praised for its seamless integration with Apache Kafka and Apache Nifi, simplifying the process of connecting these tools with a data lake.
Streamsets has proven valuable in handling real-time data consumption, filtering, tagging, and monitoring of systems, as well as anomaly detection based on traffic patterns. Users have utilized the platform for data movement, migration, and ingestion, reducing downtime and simplifying the process. Additionally, Streamsets has been widely used for data extraction from various source systems, including IoT devices, enabling users to gain insights from previously inaccessible data sources.
The tool's ability to handle different data formats elegantly and save time compared to hand-coded ETL tools has been appreciated by users. It has been effectively used for solving big data ETL problems, offering fast transfer, support for various sources and destinations, and prompt support. Streamsets has also been utilized in AI/ML tasks such as building transformations for knowledge graphs.
Overall, Streamsets has proven reliable and efficient in handling data ingestion from various sources, meeting the needs of users across industries and providing flexibility in designing pipelines with minimal coding.
IBM StreamSets Reviews
4 Reviews
Professional, Scientific, and Technical ServicesInformation Technology & Services3Research1
I use IBM StreamSets to continuously train AI models with real-time data streams, ensuring my models are always up-to-date with fresh, high-quality data. It helps me to handle schema changes easily and scale data pipelines efficiently. This allows me to automate data ingestion and transformation across diverse sources,
Pros
Real time fraud detection
Helps organisation to build personalised costumer demands
risk management makes it easy for organisations to detect potential fraud risks
Cons
Can't handle large data it lags
Eror logs ain't easy to understand
Support system takes time to respond
Likelihood to Recommend
Our development team utilized IBM StreamSets to design Data Pipelines to hydrate/load On-Prem data (from various RDBMS sources) to Cloud i.e. Azure, GCP It needs improvement in the logging as the logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding).
VU
Verified User
Engineer in Engineering (Information Technology & Services company, 11-50 employees)
So in my organisation we majorly use IBM StreamSets to automate data flows between our CRM and analytics tools. Before it, we used to do it manually/ some other non effective tool and spend hours moving and cleaning data which was quite frustrating to be honest. Now we can set up pipelines that run quite smoothly and also keeps the reports accurate.
Pros
It makes building data pipelines quite super intuitive even for non coders.
Ir also handles real time data ingestion effortlessly so I always have up to date information for my reports.
It's great at monitoring data quality as well.
Cons
The error messages I feel aren t always very descriptive so troubleshooting can take longer
Maybe more customisation options for scheduling can be done, rest it works pretty well.
Likelihood to Recommend
I think it is really well suited for scenario ls where I need to move and transform data between cloud applications quickly. It's also good for automating routine data cleaning tasks, which saves me a lot of manual efforts. Also, it ensures that I have the up to date information for the reports that are important to me, so that's a big advantage.
I mainly use IBM StreamSets to stream data from our on-prem systems to cloud applications and use them in real-time user applications to give them the latest information of various business reports that users create on different systems like client onboarding applications etc which then gets streamed to advisor applications where the advisor users create reports out of this available data and use it in their regular day to day work activities.
Pros
It helps streaming huge data that we have in our Teradata database to various reporting applications that runs on cloud seamlessly.
We also use IBM StreamSets to power few BI dashboards that our product managers use on regular basis to showcase various data with clients.
I think the data quality is way better compared to Informatica tool.
Cons
IBM should make things easy for beginners to get started with IBM StreamSets tool. Most new joinees in my team always find it difficult to do debugging in existing pipelines that we have.
The integration limitations are there. Like compared to Java where it integrates well but other frameworks like Python, .NET etc, the support is not so good.
The UI/UX interface, while intuitive for simple pipelines, sometime becomes cluttered and hard to navigate when managing complex pipelines involving more data streams.
Likelihood to Recommend
When you are dealing with a data warehouse and want to find an easy way to integrate applications and expose data in real-time, then IBM StreamSets is the best tool to go for. I'm using it for the same purpose in my applications.
This tool will be well-suited for someone with a proper technical background. Though IBM StreamSets UI is mostly drag and drop, advanced configurations require technical expertise or support to do the initial setup.
VU
Verified User
Engineer in Information Technology (Information Technology & Services company, 10,001+ employees)
Being part of one of the Healthcare Service provider accounts, we as a data engineering Team utilized StreamSets to design Data Pipelines to hydrate/load On-Prem data (from various RDBMS sources) to Cloud i.e. Azure, GCP. These Datasets are further utilized by Data scientists and analysts to generate patterns and insights for the healthcare benefits of customers.
We use StreamSets heavily not only for our Batch use cases but for real-time use cases too like consuming from Kafka topic and streaming data to Azure Event Hub.
Pros
A easy to use canvas to create Data Engineering Pipeline.
A wide range of available Stages ie. Sources, Processors, Executors, and Destinations.
Supports both Batch and Streaming Pipelines.
Scheduling is way easier than cron.
Integration with Key-Vaults for Secrets Fetching.
Cons
Monitoring/Visualization can be improvised and enhanced a lot (e.g. to monitor a Job to see what happened 7 days back with data transfer).
The logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding).
Auto Scalability for heavy load transfer (Taking much time for >5 million record transfer from JDBC to ADLS destination in Avro file transfer).
There should be a concept of creating Global variables which is missing.
Likelihood to Recommend
Majorly for all Batch and Streaming Scenarios we are designing StreamSets pipelines, few best suited and tried out use cases below : 1. JDBC to ADLS data transfer based on source refresh frequency. 2. Kafka to GCS. 3. Kafka to Azure Event. 4. Hub HDFS to ADLS data transfer. 5. Schema generation to generate Avro.
The easy to design Canvas, Scheduling Jobs, Fragment creation and utilization, an inbuilt wide range of Stage availability makes it an even more favorable tool for me to design data engineering pipelines.