TrustRadius Insights for Apache Hive are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Business Problems Solved
Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has proven to be particularly helpful in handling large datasets, migrating data between different operating systems, synchronizing programs, and fetching and generating product metrics. Users have found value in using Hive for data analytics, engineering, data science, product management, and IT-related tasks such as improving analysis of big datasets stored in Hadoop HDFS.
Furthermore, Apache Hive has simplified the process of filtering and cleaning data using SQL, reducing the learning curve for handling big data. It allows users to run SQL queries against data in Hadoop, enabling efficient analysis of large datasets without the need to learn a new language. Additionally, Hive has been utilized for building reports, analyzing data stored in the Hadoop file system, processing events gathered in HDFS, and converting them into parquet files for fast querying.
Overall, users have praised Apache Hive for its scalability, accessibility, and cost-effectiveness in storing and retrieving analytics data. It has provided an intuitive solution for storing large datasets, querying big sets of data using SQL, aggregating massive datasets into distilled information for data-driven decision making, and creating external and internal tables in Hadoop/BigData projects. With its ability to process both unstructured and structured data efficiently, Hive has become an essential tool for data analysts, engineers, and business analysts across organizations.
We have used the system to migrate data either for new versions or because we will use another operating program, the software helps us to synchronize programs between different operating systems, a history of information can be kept constant, and it can be sent to third parties the information already transformed.
Pros
Please provide some detailed examples of things that Apache Hive does particularly well.
Migration to the cloud is modern and very secure.
Cons
The best way to do this is to schedule the extraction at times established by hours and quantities.
So that it can be used normally in daily use, it must be taken into account that the maintenance management of the system so that it works effectively.
Likelihood to Recommend
Software work execution is on a large scale, it is good to use for new projects or organizational changes, data lineage mapping has always been dubious but this one has had good results. You can store and synchronize data from different departments, the storage process can be manual but it is best automated.
The software is intuitive from the first steps, one of the first features we take into account for the software does not allow duplicate files to be stored. It is advanced software that through data the system constantly learns and develops. The first phase is very effective, the analysis and checking of the information are verified in detail.
Pros
The unification of the data will help to establish the commercial criteria.
We are sure that the data is protected
Cons
If you try to extract an excessive amount of data, the system will become slow
You may have the danger that the system collapses due to the amount of data
Likelihood to Recommend
In addition to the fact that the information is quickly accessible through the established security protocols, it has not helped us as users to maintain a fairly comfortable data processing flow, it is more profitable to process the data in batches, we have been able to unify data from different sources
Apache Hive is an open-source data warehouse solution built on top of Hadoop that helps to analyze a very large amount of data. Our use case/scope is to work on a large data analytics project where the data frequency and velocity are very high. Apache Hive is very useful in processing both the unstructured and structured data in a seamless way. It help us in reducing to write complex queries as it is targeted to the SQL queries, we have a engineer team who are very proficient in writing SQL queries with the help of Apache Hive to process the big data. We have identified no business issues using the solution.
Pros
Apache Hive supports external data tables.
Supports data partitioning to improve overall performance.
Apache hive is reliable and scalable solution.
Apache Hive supports writing ad-hoc queries as well.
Cons
Apache hive is not best suited for OLTP based jobs.
Sometimes we observed high latency rate while querying data.
Limitations on providing row-level data update.
Training materials needs improvements.
Likelihood to Recommend
Apache Hive is a data warehouse/ ETL solution that is being used for processing big data for analytics and visualizations. Apache Hive has great architecture that makes it very well suited for organizations. The Metastore, is used for storing metadata for each table and its schema. The Driver operates as a controller for executions of the statements. Like other components such as Optimizer and CLI, Thrift Server are some components that enable the processing of big data transformation.
VU
Verified User
Program Manager in Information Technology (Information Technology & Services company, 201-500 employees)
I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its big data processing, for example: removing urls, finding counts of specific words, etc. Mainly it assisted in all the processing, cleaning on big datasets we collected for our research.
Pros
The SQL-like query language is very familiar to all the CS students. Hence, it's easy to use.
I used it on a server so I realize it is very scalable and can be used to process small and big datasets.
I particularly liked the UDF functionality where the user could define functions to produce particular output.
Cons
Transactions are not supported
Lack of subqueries made some tasks achievable only when completing one query and then the subsequent one
It is not as fast as spark.
Likelihood to Recommend
Apache Hive is very well suited for those who are very familiar to SQL query syntax. Due to its easy to use syntax, it can really help in scenarios where a conventional database cannot be used for analysis of big datasets.
On the other hand, it's definitely slower than some other alternatives such as spark. Also, it's not recommended to use it in processing small datasets. Pandas and other normal data loading libraries can be useful to deal with small datasets.
VU
Verified User
Engineer in Research & Development (Information Technology & Services company, 11-50 employees)
Our company primarily uses Apache Hive to manage our data warehouse by being able to query multiple databases. We partition our tables as well as monitor query performance on very custom data queries by using this hive. Hive is only used by our data analysts and an overseas data warehouse team with only a few shared licenses existing on our virtual machines.
Pros
Monitor query performance
Manage tables in the data warehouse
Uses standard SQL
Cons
UI is quite dated and not intuitive
Open-source, so does not have consistent updates or support
Not the most optimal for ETL processes
Likelihood to Recommend
Apache Hive is well suited for organizations looking for an initial tool to begin their process of managing their data warehouse as it is open-source and relatively easy to set up. This works well with some legacy systems and many consoles support this. While Hive used to be quite revolutionary, it has fallen behind many other tools that are more performant or specialized for managing DBs, writing queries, and partitioning tables.
VU
Verified User
Analyst in Professional Services (Legal Services company, 201-500 employees)
Hive is currently used in our Data Warehouse in our company. It helps us give more structure to our data and as Hive sits on top of Hadoop, the MR engine. It is a big plus when you want to run a complex query and get faster results. This helps us facilitate the Business Intelligence team to use Hive as a self-querying tool.
Pros
It's Fast!
You can store a different kind of data structures here other than the standard ones
Good scalability
Good redundancy too
Cons
It's not as ACID compliant as an RDBMS. It's a recently added feature and still needs work.
This is not the tool to go for online data processing.
It does not support sub-queries.
It can't process data in real time.
Likelihood to Recommend
This is best suited for data analysts and scientists, it's not a programmers tool. You may still need an RDBMS to read data from as updates and deletes can get a bit more complicated, you can run batch jobs, this will have to be facilitated by additional tools. Its good for fast query processing, for storing large amounts of data.
VU
Verified User
Engineer in Engineering (Information Technology and Services company, 201-500 employees)