Hadoop Professional, Scientific, and Technical Services Reviews & Insights

Score7.9 out of 10

268 Reviews and Ratings

Community insights

TrustRadius Insights for Apache Hadoop are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.

Business Problems Solved

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.

Hadoop Reviews

10 Reviews

Professional, Scientific, and Technical ServicesInformation Technology & Services8Market Research1Research1

Open source Hadoop: smart choice smart price

Rating: 8 out of 10

Incentivized

January 14, 2025

Use Cases and Deployment Scope

We are using the Apache Hadoop to handle the data which is continuously coming from different devices in real time from different geographical location across the globe and then run spark jobs and notebook to ingest the data and process it and then load it other external systems for further processing.

Pros

It’s ability to handle magnitude of data is what makes Hadoop a go to open source product
It’s open source nature makes if quite configurable
Its community support is superb.

Cons

It’s set up is quite complex which requires good knowledge of it
It’s fine tuning in terms of configuration requires in depth knowledge of the product
It’s logging can be improved

Likelihood to Recommend

When you have real time data which amounts to massive volumes close to terabytes daily, it’s become quite imperative that we should have a system which can handle it and ingest without losing it. Having Hadoop in place makes our product more robust, its stability comes handy.

The only challenge in running huge clusters is it require huge amount of space and memory for efficient working.

Verified User

Engineer in Engineering (Information Technology & Services company, 10,001+ employees)

Vetted Review

4 years of experience

Hadoop is pretty Badass

Rating: 9 out of 10

Incentivized

January 4, 2018

Use Cases and Deployment Scope

Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when various clusters fail. The Hadoop Distributed File System (HDFS) also goes a long way in helping in storing data. MapReduce and Tez, with the help of Hive of course, processes large amounts of data in a lesser time frame than expected. This helps our data warehouse to be updated with lesser resources rather than reading, processing and updating data in a relational data base.

Pros

It is cost effective.
It is highly scalable.
Failure tolerant.

Cons

Hadoop does not fit all needs.
Converting data into a single format takes time.
Need to take additional security measures to secure data.

Likelihood to Recommend

When we have data coming in from various sources, using hadoop is a good call. Its a good central station to take a good look at your data and see what needs to be done.
Hadoop should not be used directly for Real time Analytics. HDFS should be used to store data and we could use Hive to query the files.
Hadoop needs to be understood thoroughly even before attempting to use it for data warehousing needs. So you may need to take stock of what Hadoop provides, and read up on its accompanying tools to see what fits your needs.

Verified User

Engineer in Engineering (Information Technology and Services company, 201-500 employees)

Vetted Review

4 years of experience

Experience with Hadoop by a novice user.

Rating: 7 out of 10

Incentivized

May 26, 2016

Use Cases and Deployment Scope

Hadoop is not used as a norm in my organization. I just use it personally to complete my job faster. It is implemented in the research computing cluster to be used by faculty and students. It completes jobs faster by parallelizing the tasks using MapReduce framework. This gives me considerable speed in the tasks I perform.

Pros

Provides a reliable distributed storage to store and retrieve data. I am able to store data without having to worry that a node failing might cause the loss of data.
Parallelizes the task with MapReduce and helps complete the task faster. The ease of use of MapReduce makes it possible to write code in a simple way to make it run on different slaves in the cluster.
With the massive user base, it is not hard to find documentation or help relating to any problem in the area. Therefore, I rarely had any instances where I had to look for a solution for a really long time.

Cons

I would have hoped for a simpler interface if possible, so that the initial effort that had to be spent would have been much less. I often see others who are starting to use hadoop are finding it hard to learn.
I'm not sure if it is a problem with the organization and the modules they provide, but sometimes I wish there were more modules available to be used.

Likelihood to Recommend

If the user is trying to complete a task quickly and efficiently, then Hadoop is the best option for them. However, it may happen that the deadline for the submission is close and the user has little or no knowledge of Hadoop. In this case, it is easier not to use hadoop since it takes time to learn.

Muhammad Fazalul Rahman

Research Assistant in Research & Development at Rochester Institute of Technology (Research, 1-10 employees)

Vetted Review

1 year of experience

View profile

Apache Hadoop is the best open source product I used.

Rating: 9 out of 10

Incentivized

February 16, 2016

Use Cases and Deployment Scope

My present company uses Hadoop and associated technology to create a data pipeline using open source tools. Apart from that we also consult for projects which could potentially use Hadoop. Apart from that, I also work as a consultant for HDP. We actively help in installation and setup of hadoop clusters.

Pros

Hadoop is open source and with a wide community already present, the usage is much easy for individuals, startups and MNCs alike.
Hadoop works well for commodity hardware and that makes it easier to avoid pricey clusters.
Hadoop takes parallel programming to next level and helps processing of multi terabytes (even petabytes) of data easier.

Cons

While Hadoop MR parallelizes jobs involving Big Data, it is slow for smaller data sets
OLAP (analytics)is easier, however, OLTP (transactions) is a problem in most cases.
People using Hadoop have to keep in mind that small proof of concepts may not scale as expected.

Likelihood to Recommend

Hadoop is well suited only if you have large datasets to work upon. Jumping to Hadoop with small data sets won't be as useful.

Piyush Routray

Senior Software Developer in Engineering at Nvent (Information Technology and Services, 11-50 employees)

Vetted Review

2 years of experience

View profile

A very generic Hadoop review

Rating: 10 out of 10

Incentivized

December 4, 2015

Use Cases and Deployment Scope

We utilize Hadoop primarily as a large data staging area for disparate corporate data. Select data is aggregated and moved downstream to a more formal data warehouse. Some data analytics is also performed directly against the Hadoop stored data. The direct analytics is done primarily with Apache Spark utilizing Scala and Python.

Pros

No requirement for schema on write.
Ability to scale to massive amounts of data.
Open platform provides multiple options and customizations to fit your exact needs.

Cons

The platform is still maturing and can be confusing to research and use. Basic tasks can still be manual and are not always user friendly.

Likelihood to Recommend

A big data problem doesn't always mean huge volumes of data. The other V's of big data (velocity and variety) are also important factors that may lead to selecting Hadoop as a platform.

Pierre LaFromboise

Senior Principal Consultant, BI Service Line Lead in Information Technology at Oakwood Systems Group (Information Technology and Services, 51-200 employees)

Vetted Review

2 years of experience

View profile

Hadoop >>>> Traditional proprietary Systems

Rating: 10 out of 10

January 19, 2015

Use Cases and Deployment Scope

My company's new cloud based architecture is Hadoop based . It is being used across several organizations in our company . Using Hadoop our company has been able to solve many big data problems faster with very high performance.

Pros

Cost Effective
Distributed and Fault Tolerant
Easily Scalable

Cons

Cluster management and debugging is kind of not user friendly ( Doesn't has many tools )
More focus should be given to Hadoop Security
Single Master Node
More user adoption ( Even though it is increasing by each day )

Likelihood to Recommend

Hadoop is best suited for processing and analyzing unstructured and huge volumes of data . So ask yourself if the problem you are trying to solve involves unstructured data and also the volume .

Verified User

Engineer in Engineering (Information Technology and Services company, 10,001+ employees)

Vetted Review

3 years of experience

Advantage Hadoopo

Rating: 10 out of 10

May 9, 2014

Use Cases and Deployment Scope

We are using it for Retail data ETL processing. This is going to be used in whole organization. It allows terabytes of data to be processed in faster manner with scalability.

Pros

Processes big volume of data using parallelism in faster manner.
No schema required. Hadoop can process any type of data.
Hadoop is horizontally scalable.
Hadoop is free.

Cons

Development tools are not that friendly.
Hard to find hadoop resources.

Likelihood to Recommend

Hadoop is not a replacement of a transactional system such as RDBMS. It is suitable for batch processing.

Ajay Jha

Hadoop Architect/Lead in Research & Development at SPINS (Market Research, 51-200 employees)

Vetted Review

5 years of experience

Hadoop for better economy and efficiency

Rating: 7 out of 10

May 9, 2014

Use Cases and Deployment Scope

Hadoop is used for storing and analyzing log data (logs from warehouse loads or other data processing) as well as storing and retrieving financial data from JD Edwards. It's also planned to be used for archival. Hadoop is used by several departments within our organization. Currently, we are paying a lot of money for hosting historical data and we plan to move that to Hadoop; reducing our storage costs. Also, we got a much better performance out of our Hadoop cluster for processing a large amount of financial data. So, in that senese, Hadoop addressed multiple business problems for us.

Pros

Hadoop stores and processes unstructured data such as web access logs or logs of data processing very well
Hadoop can be effectively used for archiving; providing a very economic, fast, flexible, scalable and reliable way to store data
Hadoop can be used to store and process a very large amount of data very fast

Cons

Security is a piece that's missing from Hadoop - you have to supplement security using Kerberos etc.
Hadoop is not easy to learn - there are various modules with little or no documentation
Hadoop being open-source, testing, quality control and version control are very difficult

Likelihood to Recommend

Hadoop is best suited for warehouse or OLAP processing. It's not suitable for OLTP or small transaction processing

Bhushan Lakhe

Senior Vice President in Information Technology at Ipsos (Information Technology and Services, 10,001+ employees)

Vetted Review

3 years of experience

Hadoop usage and relevance for large data indexing and searching (getting Trends of Data)

Rating: 9 out of 10

May 3, 2014

Use Cases and Deployment Scope

We used to understand hadoop stack for data indexing and search using the pentaho plugin.

Pros

Setup cluster of data on commodity server
Distribute data indexing and search process
Faster access to distributed indexed data using pentaho plugin and don't need to write whole set of analytics scripts that pentaho does for us

Cons

Uniformity of installation

Likelihood to Recommend

For data indexing (unstructured content) and search without any costly server... there is no real alternative option?

Chandra Gupta

Director research in Professional Services at Perficient (Information Technology and Services, 51-200 employees)

Vetted Review

Reseller

1 year of experience

Hadoop review

Rating: 8 out of 10

Incentivized

May 1, 2014

Use Cases and Deployment Scope

We use Hadoop for our ETL and analytic functions. We stream data and land it on HDFS and then massage and transform data. We then use Hive interface to query this data. Using Sqoop we export and import data in and out of hadoop ecosystem. We store the data on HDFS in Avro and Parquet file formats.

Pros

Streaming data and loading to HDFS
Load jobs using Oozie and Sqoop for exporting data.
Analytic queries using MapReduce, Spark and Hive

Cons

Speed is one of the improvements we are looking for. We see Spark as an option and we are excited.

Likelihood to Recommend

OLTP is a scenario I think it is less appropriate. But future will be certainly different.

Pramod Deshmukh

Big Data / Hadoop Consultant in Information Technology at Consultant - Humedica (Information Technology and Services, 51-200 employees)

Vetted Review

3 years of experience

Loading Reviews List....