Apache Hive Professional, Scientific, and Technical Services Reviews & Insights

Score8 out of 10

95 Reviews and Ratings

Community insights

TrustRadius Insights for Apache Hive are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.

Business Problems Solved

Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has proven to be particularly helpful in handling large datasets, migrating data between different operating systems, synchronizing programs, and fetching and generating product metrics. Users have found value in using Hive for data analytics, engineering, data science, product management, and IT-related tasks such as improving analysis of big datasets stored in Hadoop HDFS.

Furthermore, Apache Hive has simplified the process of filtering and cleaning data using SQL, reducing the learning curve for handling big data. It allows users to run SQL queries against data in Hadoop, enabling efficient analysis of large datasets without the need to learn a new language. Additionally, Hive has been utilized for building reports, analyzing data stored in the Hadoop file system, processing events gathered in HDFS, and converting them into parquet files for fast querying.

Overall, users have praised Apache Hive for its scalability, accessibility, and cost-effectiveness in storing and retrieving analytics data. It has provided an intuitive solution for storing large datasets, querying big sets of data using SQL, aggregating massive datasets into distilled information for data-driven decision making, and creating external and internal tables in Hadoop/BigData projects. With its ability to process both unstructured and structured data efficiently, Hive has become an essential tool for data analysts, engineers, and business analysts across organizations.

Apache Hive Reviews

7 Reviews

Professional, Scientific, and Technical ServicesLegal Services1Information Technology & Services4Marketing & Advertising2

This system makes active data of value.

Rating: 8 out of 10

Incentivized

April 9, 2022

Use Cases and Deployment Scope

We have used the system to migrate data either for new versions or because we will use another operating program, the software helps us to synchronize programs between different operating systems, a history of information can be kept constant, and it can be sent to third parties the information already transformed.

Pros

Please provide some detailed examples of things that Apache Hive does particularly well.
Migration to the cloud is modern and very secure.

Cons

The best way to do this is to schedule the extraction at times established by hours and quantities.
So that it can be used normally in daily use, it must be taken into account that the maintenance management of the system so that it works effectively.

Likelihood to Recommend

Software work execution is on a large scale, it is good to use for new projects or organizational changes, data lineage mapping has always been dubious but this one has had good results. You can store and synchronize data from different departments, the storage process can be manual but it is best automated.

Camilo Palacios

Administrador informático. in Marketing at Logitech (Marketing & Advertising, 51-200 employees)

Vetted Review

2 years of experience

View profile

It is an advance to the ease of the processes

Rating: 8 out of 10

Incentivized

April 8, 2022

Use Cases and Deployment Scope

The software is intuitive from the first steps, one of the first features we take into account for the software does not allow duplicate files to be stored. It is advanced software that through data the system constantly learns and develops. The first phase is very effective, the analysis and checking of the information are verified in detail.

Pros

The unification of the data will help to establish the commercial criteria.
We are sure that the data is protected

Cons

If you try to extract an excessive amount of data, the system will become slow
You may have the danger that the system collapses due to the amount of data

Likelihood to Recommend

In addition to the fact that the information is quickly accessible through the established security protocols, it has not helped us as users to maintain a fairly comfortable data processing flow, it is more profitable to process the data in batches, we have been able to unify data from different sources

Pablo Gonzalez

Internet Marketing Manager in Marketing at MKTi México (Marketing & Advertising, 51-200 employees)

Vetted Review

2 years of experience

View profile

Excellent bigdata warehouse solution

Rating: 9 out of 10

Incentivized

April 7, 2022

Use Cases and Deployment Scope

Apache Hive is an open-source data warehouse solution built on top of Hadoop that helps to analyze a very large amount of data.
Our use case/scope is to work on a large data analytics project where the data frequency and velocity are very high. Apache Hive is very useful in processing both the unstructured and structured data in a seamless way. It help us in reducing to write complex queries as it is targeted to the SQL queries, we have a engineer team who are very proficient in writing SQL queries with the help of Apache Hive to process the big data.
We have identified no business issues using the solution.

Pros

Apache Hive supports external data tables.
Supports data partitioning to improve overall performance.
Apache hive is reliable and scalable solution.
Apache Hive supports writing ad-hoc queries as well.

Cons

Apache hive is not best suited for OLTP based jobs.
Sometimes we observed high latency rate while querying data.
Limitations on providing row-level data update.
Training materials needs improvements.

Likelihood to Recommend

Apache Hive is a data warehouse/ ETL solution that is being used for processing big data for analytics and visualizations. Apache Hive has great architecture that makes it very well suited for organizations.
The Metastore, is used for storing metadata for each table and its schema. The Driver operates as a controller for executions of the statements. Like other components such as Optimizer and CLI, Thrift Server are some components that enable the processing of big data transformation.

Verified User

Program Manager in Information Technology (Information Technology & Services company, 201-500 employees)

Vetted Review

2 years of experience

Big Data the SQL way

Rating: 8 out of 10

Incentivized

September 23, 2020

Use Cases and Deployment Scope

I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its big data processing, for example: removing urls, finding counts of specific words, etc. Mainly it assisted in all the processing, cleaning on big datasets we collected for our research.

Pros

The SQL-like query language is very familiar to all the CS students. Hence, it's easy to use.
I used it on a server so I realize it is very scalable and can be used to process small and big datasets.
I particularly liked the UDF functionality where the user could define functions to produce particular output.

Cons

Transactions are not supported
Lack of subqueries made some tasks achievable only when completing one query and then the subsequent one
It is not as fast as spark.

Likelihood to Recommend

Apache Hive is very well suited for those who are very familiar to SQL query syntax. Due to its easy to use syntax, it can really help in scenarios where a conventional database cannot be used for analysis of big datasets.

On the other hand, it's definitely slower than some other alternatives such as spark. Also, it's not recommended to use it in processing small datasets. Pandas and other normal data loading libraries can be useful to deal with small datasets.

Verified User

Engineer in Research & Development (Information Technology & Services company, 11-50 employees)

Vetted Review

1 year of experience

Apache Hive: SQL, open-source querying tool

Rating: 7 out of 10

Incentivized

September 18, 2020

Use Cases and Deployment Scope

Our company primarily uses Apache Hive to manage our data warehouse by being able to query multiple databases. We partition our tables as well as monitor query performance on very custom data queries by using this hive. Hive is only used by our data analysts and an overseas data warehouse team with only a few shared licenses existing on our virtual machines.

Pros

Monitor query performance
Manage tables in the data warehouse
Uses standard SQL

Cons

UI is quite dated and not intuitive
Open-source, so does not have consistent updates or support
Not the most optimal for ETL processes

Likelihood to Recommend

Apache Hive is well suited for organizations looking for an initial tool to begin their process of managing their data warehouse as it is open-source and relatively easy to set up. This works well with some legacy systems and many consoles support this. While Hive used to be quite revolutionary, it has fallen behind many other tools that are more performant or specialized for managing DBs, writing queries, and partitioning tables.

Verified User

Analyst in Professional Services (Legal Services company, 201-500 employees)

Vetted Review

1 year of experience

Hive is solid data analytical tool

Rating: 9 out of 10

Incentivized

June 7, 2018

Use Cases and Deployment Scope

Hive is currently used in our Data Warehouse in our company. It helps us give more structure to our data and as Hive sits on top of Hadoop, the MR engine. It is a big plus when you want to run a complex query and get faster results. This helps us facilitate the Business Intelligence team to use Hive as a self-querying tool.

Pros

It's Fast!
You can store a different kind of data structures here other than the standard ones
Good scalability
Good redundancy too

Cons

It's not as ACID compliant as an RDBMS. It's a recently added feature and still needs work.
This is not the tool to go for online data processing.
It does not support sub-queries.
It can't process data in real time.

Likelihood to Recommend

This is best suited for data analysts and scientists, it's not a programmers tool. You may still need an RDBMS to read data from as updates and deletes can get a bit more complicated, you can run batch jobs, this will have to be facilitated by additional tools.
Its good for fast query processing, for storing large amounts of data.

Verified User

Engineer in Engineering (Information Technology and Services company, 201-500 employees)

Vetted Review

4 years of experience

One of the first SQL on Hadoop tools. Perhaps not the best.

Rating: 7 out of 10

Incentivized

October 19, 2016

Use Cases and Deployment Scope

Hive allows us to run SQL queries against data sitting in Hadoop.

Pros

One of the standard SQL on Hadoop implementations. Comes installed in both HDP and CDH Hadoop distributions.
Hive Live Long and Process has made recent significant improvement on long-running queries.
Allows BI tools to run analysis over Hadoop data.
Allows various relational databases for its metastore. These include MySQL, Postgres, Derby, or Oracle.

Cons

Needs to keep up with execution engine improvements. Spark or Tez on Hive, then LLAP are good starts.
Overall speed of ad-hoc querying could be improved.

Likelihood to Recommend

Hive is well-suited for providing an SQL engine on Hadoop, but there are alternative SQL on Hadoop projects that claim to have improvements over Hive.

Jordan Moore

Staff Consultant in Information Technology at Avalon Consulting, LLC (Information Technology and Services, 51-200 employees)

Vetted Review

2 years of experience

View profile

Loading Reviews List....