TrustRadius Insights for Apache Hadoop are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Business Problems Solved
Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.
Loading Reviews List....
Hadoop Reviews
10 Reviews
Professional, Scientific, and Technical ServicesInformation Technology & Services8Market Research1Research1
We are using the Apache Hadoop to handle the data which is continuously coming from different devices in real time from different geographical location across the globe and then run spark jobs and notebook to ingest the data and process it and then load it other external systems for further processing.
Pros
It’s ability to handle magnitude of data is what makes Hadoop a go to open source product
It’s open source nature makes if quite configurable
Its community support is superb.
Cons
It’s set up is quite complex which requires good knowledge of it
It’s fine tuning in terms of configuration requires in depth knowledge of the product
It’s logging can be improved
Likelihood to Recommend
When you have real time data which amounts to massive volumes close to terabytes daily, it’s become quite imperative that we should have a system which can handle it and ingest without losing it. Having Hadoop in place makes our product more robust, its stability comes handy.
The only challenge in running huge clusters is it require huge amount of space and memory for efficient working.
VU
Verified User
Engineer in Engineering (Information Technology & Services company, 10,001+ employees)
Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when various clusters fail. The Hadoop Distributed File System (HDFS) also goes a long way in helping in storing data. MapReduce and Tez, with the help of Hive of course, processes large amounts of data in a lesser time frame than expected. This helps our data warehouse to be updated with lesser resources rather than reading, processing and updating data in a relational data base.
Pros
It is cost effective.
It is highly scalable.
Failure tolerant.
Cons
Hadoop does not fit all needs.
Converting data into a single format takes time.
Need to take additional security measures to secure data.
Likelihood to Recommend
When we have data coming in from various sources, using hadoop is a good call. Its a good central station to take a good look at your data and see what needs to be done. Hadoop should not be used directly for Real time Analytics. HDFS should be used to store data and we could use Hive to query the files. Hadoop needs to be understood thoroughly even before attempting to use it for data warehousing needs. So you may need to take stock of what Hadoop provides, and read up on its accompanying tools to see what fits your needs.
VU
Verified User
Engineer in Engineering (Information Technology and Services company, 201-500 employees)
Hadoop is not used as a norm in my organization. I just use it personally to complete my job faster. It is implemented in the research computing cluster to be used by faculty and students. It completes jobs faster by parallelizing the tasks using MapReduce framework. This gives me considerable speed in the tasks I perform.
Pros
Provides a reliable distributed storage to store and retrieve data. I am able to store data without having to worry that a node failing might cause the loss of data.
Parallelizes the task with MapReduce and helps complete the task faster. The ease of use of MapReduce makes it possible to write code in a simple way to make it run on different slaves in the cluster.
With the massive user base, it is not hard to find documentation or help relating to any problem in the area. Therefore, I rarely had any instances where I had to look for a solution for a really long time.
Cons
I would have hoped for a simpler interface if possible, so that the initial effort that had to be spent would have been much less. I often see others who are starting to use hadoop are finding it hard to learn.
I'm not sure if it is a problem with the organization and the modules they provide, but sometimes I wish there were more modules available to be used.
Likelihood to Recommend
If the user is trying to complete a task quickly and efficiently, then Hadoop is the best option for them. However, it may happen that the deadline for the submission is close and the user has little or no knowledge of Hadoop. In this case, it is easier not to use hadoop since it takes time to learn.
My present company uses Hadoop and associated technology to create a data pipeline using open source tools. Apart from that we also consult for projects which could potentially use Hadoop. Apart from that, I also work as a consultant for HDP. We actively help in installation and setup of hadoop clusters.
Pros
Hadoop is open source and with a wide community already present, the usage is much easy for individuals, startups and MNCs alike.
Hadoop works well for commodity hardware and that makes it easier to avoid pricey clusters.
Hadoop takes parallel programming to next level and helps processing of multi terabytes (even petabytes) of data easier.
Cons
While Hadoop MR parallelizes jobs involving Big Data, it is slow for smaller data sets
OLAP (analytics)is easier, however, OLTP (transactions) is a problem in most cases.
People using Hadoop have to keep in mind that small proof of concepts may not scale as expected.
Likelihood to Recommend
Hadoop is well suited only if you have large datasets to work upon. Jumping to Hadoop with small data sets won't be as useful.
We utilize Hadoop primarily as a large data staging area for disparate corporate data. Select data is aggregated and moved downstream to a more formal data warehouse. Some data analytics is also performed directly against the Hadoop stored data. The direct analytics is done primarily with Apache Spark utilizing Scala and Python.
Pros
No requirement for schema on write.
Ability to scale to massive amounts of data.
Open platform provides multiple options and customizations to fit your exact needs.
Cons
The platform is still maturing and can be confusing to research and use. Basic tasks can still be manual and are not always user friendly.
Likelihood to Recommend
A big data problem doesn't always mean huge volumes of data. The other V's of big data (velocity and variety) are also important factors that may lead to selecting Hadoop as a platform.
My company's new cloud based architecture is Hadoop based . It is being used across several organizations in our company . Using Hadoop our company has been able to solve many big data problems faster with very high performance.
Pros
Cost Effective
Distributed and Fault Tolerant
Easily Scalable
Cons
Cluster management and debugging is kind of not user friendly ( Doesn't has many tools )
More focus should be given to Hadoop Security
Single Master Node
More user adoption ( Even though it is increasing by each day )
Likelihood to Recommend
Hadoop is best suited for processing and analyzing unstructured and huge volumes of data . So ask yourself if the problem you are trying to solve involves unstructured data and also the volume .
VU
Verified User
Engineer in Engineering (Information Technology and Services company, 10,001+ employees)
We are using it for Retail data ETL processing. This is going to be used in whole organization. It allows terabytes of data to be processed in faster manner with scalability.
Pros
Processes big volume of data using parallelism in faster manner.
No schema required. Hadoop can process any type of data.
Hadoop is horizontally scalable.
Hadoop is free.
Cons
Development tools are not that friendly.
Hard to find hadoop resources.
Likelihood to Recommend
Hadoop is not a replacement of a transactional system such as RDBMS. It is suitable for batch processing.
Hadoop is used for storing and analyzing log data (logs from warehouse loads or other data processing) as well as storing and retrieving financial data from JD Edwards. It's also planned to be used for archival. Hadoop is used by several departments within our organization. Currently, we are paying a lot of money for hosting historical data and we plan to move that to Hadoop; reducing our storage costs. Also, we got a much better performance out of our Hadoop cluster for processing a large amount of financial data. So, in that senese, Hadoop addressed multiple business problems for us.
Pros
Hadoop stores and processes unstructured data such as web access logs or logs of data processing very well
Hadoop can be effectively used for archiving; providing a very economic, fast, flexible, scalable and reliable way to store data
Hadoop can be used to store and process a very large amount of data very fast
Cons
Security is a piece that's missing from Hadoop - you have to supplement security using Kerberos etc.
Hadoop is not easy to learn - there are various modules with little or no documentation
Hadoop being open-source, testing, quality control and version control are very difficult
Likelihood to Recommend
Hadoop is best suited for warehouse or OLAP processing. It's not suitable for OLTP or small transaction processing
We use Hadoop for our ETL and analytic functions. We stream data and land it on HDFS and then massage and transform data. We then use Hive interface to query this data. Using Sqoop we export and import data in and out of hadoop ecosystem. We store the data on HDFS in Avro and Parquet file formats.
Pros
Streaming data and loading to HDFS
Load jobs using Oozie and Sqoop for exporting data.
Analytic queries using MapReduce, Spark and Hive
Cons
Speed is one of the improvements we are looking for. We see Spark as an option and we are excited.
Likelihood to Recommend
OLTP is a scenario I think it is less appropriate. But future will be certainly different.