TrustRadius Insights for Apache Hadoop are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Business Problems Solved
Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.
Loading Reviews List....
Hadoop Reviews
10 Reviews
Engineering
Search is temporarily unavailable. Filters are still applied.
We are using the Apache Hadoop to handle the data which is continuously coming from different devices in real time from different geographical location across the globe and then run spark jobs and notebook to ingest the data and process it and then load it other external systems for further processing.
Pros
It’s ability to handle magnitude of data is what makes Hadoop a go to open source product
It’s open source nature makes if quite configurable
Its community support is superb.
Cons
It’s set up is quite complex which requires good knowledge of it
It’s fine tuning in terms of configuration requires in depth knowledge of the product
It’s logging can be improved
Likelihood to Recommend
When you have real time data which amounts to massive volumes close to terabytes daily, it’s become quite imperative that we should have a system which can handle it and ingest without losing it. Having Hadoop in place makes our product more robust, its stability comes handy.
The only challenge in running huge clusters is it require huge amount of space and memory for efficient working.
Apache Hadoop is an open-source software library that is designed for the collection, storage, and analysis of large amounts of data sets. Apache Hadoop’s architecture comprises components that include a distributed file system. This is mostly used for massive data collection, analytics, and storage. Also, having consistent data can be integrated across other platforms and have one single source of truth.
Pros
Apache Hadoop has made managing large amounts of data quite easy.
The system contains a file system known as HDFS (Hadoop Distributed File System) which processes components and programs.
The parallel processing tool of this software is also a good aspect of Apache Hadoop.
It keeps interesting and reliable features and functions.
Apache Hadoop also has a store of very big data files in machines with high levels of availability.
Cons
I personally feel that Apache Hadoop is slower as compared to other interactive querying platforms. Queries can take up to hours sometimes which can be frustrating and discouraging sometimes.
Also, there are so many modules of Apache Hadoop so it takes so much more time to learn all of them. Other than that, optimization is somewhat a challenge in Apache Hadoop.
Likelihood to Recommend
Altogether, I want to say that Apache Hadoop is well-suited to a larger and unstructured data flow like an aggregation of web traffic or even advertising. I think Apache Hadoop is great when you literally have petabytes of data that need to be stored and processed on an ongoing basis. Also, I would recommend that the software should be supplemented with a faster and interactive database for a better querying service. Lastly, it's very cost-effective so it is good to give it a shot before coming to any conclusion.
Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when various clusters fail. The Hadoop Distributed File System (HDFS) also goes a long way in helping in storing data. MapReduce and Tez, with the help of Hive of course, processes large amounts of data in a lesser time frame than expected. This helps our data warehouse to be updated with lesser resources rather than reading, processing and updating data in a relational data base.
Pros
It is cost effective.
It is highly scalable.
Failure tolerant.
Cons
Hadoop does not fit all needs.
Converting data into a single format takes time.
Need to take additional security measures to secure data.
Likelihood to Recommend
When we have data coming in from various sources, using hadoop is a good call. Its a good central station to take a good look at your data and see what needs to be done. Hadoop should not be used directly for Real time Analytics. HDFS should be used to store data and we could use Hive to query the files. Hadoop needs to be understood thoroughly even before attempting to use it for data warehousing needs. So you may need to take stock of what Hadoop provides, and read up on its accompanying tools to see what fits your needs.
Hadoop has been an amazing development in the world of Big Data. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate datapoints.
Pros
Hadoop can take loads of data quickly and performs well under load.
Hadoop is customizable so that nearly any business objective can be justified with the right combination of data and reports.
Hadoop has a lot of great resources, both informal like the community and formal like the supported modules and training.
Cons
Hadoop is not a relational database, but it has the ability to add modules to run sql-like queries like Impala and Hive.
Hadoop is open source and has many modules. It can be difficult without context to know which modules to leverage.
Likelihood to Recommend
Hadoop is well suited for organizations with a lot of data, trying to justify business decisions with data-driven KPIs and milestones. This tool is best utilized by engineers with data modeling experience and a high-level understanding of how the different data points can be used and correlated. It will be challenging for people with limited knowledge of the business and how data points are created.
My present company uses Hadoop and associated technology to create a data pipeline using open source tools. Apart from that we also consult for projects which could potentially use Hadoop. Apart from that, I also work as a consultant for HDP. We actively help in installation and setup of hadoop clusters.
Pros
Hadoop is open source and with a wide community already present, the usage is much easy for individuals, startups and MNCs alike.
Hadoop works well for commodity hardware and that makes it easier to avoid pricey clusters.
Hadoop takes parallel programming to next level and helps processing of multi terabytes (even petabytes) of data easier.
Cons
While Hadoop MR parallelizes jobs involving Big Data, it is slow for smaller data sets
OLAP (analytics)is easier, however, OLTP (transactions) is a problem in most cases.
People using Hadoop have to keep in mind that small proof of concepts may not scale as expected.
Likelihood to Recommend
Hadoop is well suited only if you have large datasets to work upon. Jumping to Hadoop with small data sets won't be as useful.
My organization uses Apache Hadoop for log analysis/data mining of data fetched from different practices in the US, Canada and India. It uses this data for showing analytical graphs and the progress of our software in those regions. Data from the practices is optimized and consumed by the customer applications. It provides faster performance and ease for data usage.
Pros
Hadoop is a very cost effective storage solution for businesses’ exploding data sets.
Hadoop can store and distribute very large data sets across hundreds of servers that operate, therefore it is a highly scalable storage platform.
Hadoop can process terabytes of data in minutes and faster as compared to other data processors.
Hadoop File System can store all types of data, structured and unstructured, in nodes across many servers
Cons
For now, Hadoop is doing great and is very productive.
Likelihood to Recommend
Hadoop is well suited for healthcare organizations that deal with huge amounts of data and optimizing data.
I have used Hadoop for building business feeds for a telecom client. The major purpose for using Hadoop was to tackle the problem of gaining insights into the ever growing number of business data. We leveraged the map reduce programming model to churn more than 30 gigabytes of data per day into actionable and aggregated data which was further leveraged by campaign teams to design and shape marketing and by product teams to envision new customer experiences.
Pros
Hadoop is an excellent framework for building distributed, fault tolerant data processing systems which leverage HDFS which is optimized for low latency storage and high throughput performance.
Hadoop Map reduce is a powerful programming model and can be leveraged directly either via use of Java programming language or by data flow languages like Apache Pig.
Hadoop has a reach eco system of companion tools which enable easy integration for ingesting large amounts of data efficiently from various sources. For example Apache Flume can act as data bus which can use HDFS as a sink and integrates effectively with disparate data sources.
Hadoop can also be leveraged to build complex data processing and machine learning workflows, due to availability of Apache Mahout, which uses the map reduce model of Hadoop to run complex algorithms.
Cons
Hadoop is a batch oriented processing framework, it lacks real time or stream processing.
Hadoop's HDFS file system is not a POSIX compliant file system and does not work well with small files, especially smaller than the default block size.
Hadoop cannot be used for running interactive jobs or analytics.
Likelihood to Recommend
1. How large are your data sets? If your answer is few gigabytes, Hadoop may be overkill for your needs. 2. Do you require real-time analytical processing? If yes, Hadoop's map reduce may not be a great asset in that scenario. 3. Do you want to want to process data in a batch processing fashion and scale for TeraBytes size clusters? Hadoop is definitely a great fit for your use case.
I used Hadoop for my academic projects for processing high volume data of my data mining project.
Pros
It was able to map our data with clear distinction based on the key.
We were able to write simple map reduce code which ran simultaneously on multiple nodes.
The auto heal system was really helpful in case of multiple failures.
Cons
I think Hadoop should not have single point of failure in terms of name node.
It should have good public facing API's for easy integration.
Internals of Hadoop are very abstract.
Protoco Buffers is a really good concept but I am not sure if we have checked other options as well.
Likelihood to Recommend
I think Hadoop has multiple flavors which people can customize to use as per their requirement. But I would choose hadoop based on following factors: 1. Number of nodes decision based on parallelism we want. 2. The module we want to run should be able to run parallely on all machine.
I have being using Hadoop for the last 12 months and really find it effective while dealing with large amounts of data. I have used Hadoop jointly with Apache Mahout for building a recommendation system and got amazing results. It was fast, reliable and easy to manage.
Pros
Fast. Prior to working with Hadoop I had many performance based issues where our system was very slow and took time. But after using Hadoop the performance was significantly increased.
Fault tolerant. The HDFS (Hadoop distributed file system) is good platform for working with large data sets and makes the system fault tolerant.
Scalable. As Hadoop can deal with structured and unstructured data it makes the system scalable.
Cons
Security. As it has to deal with a large data set it can be vulnerable to malicious data.
Less performance with smaller data. Doesn't provide effective results if the data is very small.
Requires a skilled person to handle the system.
Likelihood to Recommend
I would recommend Hadoop when a system is dealing with huge amount of data.
My company's new cloud based architecture is Hadoop based . It is being used across several organizations in our company . Using Hadoop our company has been able to solve many big data problems faster with very high performance.
Pros
Cost Effective
Distributed and Fault Tolerant
Easily Scalable
Cons
Cluster management and debugging is kind of not user friendly ( Doesn't has many tools )
More focus should be given to Hadoop Security
Single Master Node
More user adoption ( Even though it is increasing by each day )
Likelihood to Recommend
Hadoop is best suited for processing and analyzing unstructured and huge volumes of data . So ask yourself if the problem you are trying to solve involves unstructured data and also the volume .