Apache Spark Manufacturing Reviews & Insights

Score9.2 out of 10

161 Reviews and Ratings

Community insights

TrustRadius Insights for Apache Spark are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.

Pros

Great Computing Engine: Apache Spark is praised by many users for its capabilities in handling complex transformative logic and sophisticated data processing tasks. Several reviewers have mentioned that it is a great computing engine, indicating its effectiveness in solving intricate problems.

Valuable Insights and Analysis: Many reviewers find Apache Spark to be useful for understanding data and performing data analytical work. They appreciate the valuable insights and analysis capabilities provided by the software, suggesting that it helps them gain deeper understanding of their data.

Extensive Set of Libraries and APIs: The extensive set of libraries and APIs offered by Apache Spark has been highly appreciated by users. It provides a wide range of tools and functionalities to solve various day-to-day problems, making it a versatile choice for different data processing needs.

Apache Spark Reviews

3 Reviews

ManufacturingElectrical & Electronic Manufacturing1Pharmaceuticals1Consumer Goods1

Great open source tool for data processing

Rating: 9 out of 10

Incentivized

December 13, 2019

Use Cases and Deployment Scope

We do use Apache Spark for cluster computing for our ETL environment, data and analytics as well as machine learning. It is mainly used by our data engineering team to support the entire Data Lake foundation. As we have huge amounts of information coming from multiple sources, we needed an effective cluster management system to handle capacity and deliver the performance and throughput we needed.

Pros

Cluster management for ETL.
Data processing engine for our data lake.

Cons

You still need Hive or other HDFS to store information.
Security is behind compared to MapReduce.

Likelihood to Recommend

Spark is a one-size-fits-all data processing platform. You can run batch and in-motion streams, you can use for ETL, machine learning or even graphs. You do not have multiple tools, so it makes your TCO and management tasks way easier. As every new platform, has room to grow: storage and security are the main opportunities we found.

Verified User

Executive in Information Technology (Consumer Goods company, 10,001+ employees)

Vetted Review

My Apache Spark Review

Rating: 9 out of 10

Incentivized

June 7, 2018

Use Cases and Deployment Scope

My company uses Apache Spark in various ways including machine learning, analytics and batch processing. [We] Grab the data from other sources and put it into a Hadoop environment. [We] Build data lakes. SparkSQL is also used for analysis of data and to develop reports. We have deployed the clusters in Cloudera. Because of Apache Spark, it has become very easy to apply data science in a big data field.

Pros

Easy ELT Process
Easy clustering on cloud
Amazing speed
Batch & real time processing

Cons

Debugging is difficult as it is new for most people
There are fewer learning resources

Likelihood to Recommend

When the data is very big, and you cannot afford a lot of computational timing such as in a real-time environment, it is advisable to use Apache Spark. There are alternatives to Apache Spark, but it is the most common and robust tool to work with. It is great at batch processing.

Kartik Chavan

Data Analyst in Information Technology at The University of Texas at Arlington (Electrical/Electronic Manufacturing, 1001-5000 employees)

Vetted Review

View profile

Sparkling Spark

Rating: 10 out of 10

Incentivized

June 26, 2017

Use Cases and Deployment Scope

It's being replaced as the traditional ETL tool and we are using Apache Spark for data science solutions.

Pros

It makes the ETL process very simple when compared to SQL SERVER and MYSQL ETL tools.
It's very fast and has many machine learning algorithms which can be used for data science problems.
It is easily implemented on a cloud cluster.

Cons

The initialization and spark context procedures.
Running applications on a cluster is not well documented anywhere, some applications are hard to debug.
Debugging and Testing are sometimes time-consuming.

Likelihood to Recommend

It's well suited for ETL, data Integration, and data science problems of large data sets. It's not at all suitable for small data sets which can be done on desktops and laptops using the Python tool.

Sunil Dhage

Big Data Analyst in Information Technology at PSL Group (Pharmaceuticals, 51-200 employees)

Vetted Review

View profile

Loading Reviews List....