TrustRadius: an HG Insights company

Amazon EMR

Score8.2 out of 10

61 Reviews and Ratings

What is Amazon EMR?

Amazon EMR is a cloud-native big data platform for processing vast amounts of data quickly, at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the scalability of Amazon EC2 and scalable storage of Amazon S3, EMR gives analytical teams the engines and elasticity to run Petabyte-scale analysis.

Categories & Use Cases

Cost reduction with Amazon EMR on EKS

Use Cases and Deployment Scope

Amazon EMR (Elastic MapReduce) is heavily used at my organization for most if not all data pipeline computations: we started by using EC2 instances, we then moved to EMR Serverless and we are actually completing the transition to EMR on EKS. In general we use it for long-running analysis (SQLs with a lot of JOINs) and overall for batch processing. From what I've seen, we use it with Spark under the hood.

Pros

  • EMR on EKS is really flexible and cost-saving
  • Flexibility on how to run the jobs (and different implementations to choose from)
  • Support online and it's a regularly updated product

Cons

  • EMR on EKS could be better documented, especially since for the "magic" it does under the hood when using Spark
  • UI can be improved (especially for EMR on EKS)

Return on Investment

  • Switching to EMR on EKS most of our EMR on EC2 jobs has produced a reduction of 4% in the overall costs (while maintaining the same level of data freshness)

Usability

Other Software Used

Apache Spark, Apache Airflow, Amazon S3 (Simple Storage Service), dbt

EMR: Great Services for Analytics

Use Cases and Deployment Scope

On request transitory clusters for huge information handling. I like its accessibility completely different taken a toll tire makes it greatly flexible for distinctive scale clients. Can be pre-installed with any Huge information apparatuses like Hive, Start, Pig, etc. Nitty-gritty cluster observing makes a difference to track a few measurements, in turn, makes a difference to diminish fetched.

Pros

  • Big data processing.
  • The resizing feature is good.
  • Ease of use and creating new clusters.

Cons

  • The user interface could use a facelift.
  • Overhead delay in starting clusters.
  • Big learning curve for someone who hasn't used a program like this before.

Most Important Features

  • EMR can execute the code utilizing start or other clusters like Hadoop.
  • Execution time comes down to a few minutes as against a few hours running on either EC2 or other computing servers.
  • Easy to select between hadoop or start based EMR clusters.

Return on Investment

  • Reduced times of processing.
  • He platform is very useful in regards to its processing and storage of big data.
  • No need to handle complex configuration of Big data platform.

My chosen scaleable cloud platform

Use Cases and Deployment Scope

I use Amazon EMR (Elastic Map Reduce) as a scalable platform to deploy my client solutions onto. It allows me to scale our solution elastically in the cloud and allows us to deal with any data size, volume, or complexity. It is very easy to configure and scale and it is my preferred platform to deploy to.

Pros

  • Scalability
  • Costings
  • Flexibility

Cons

  • Costs
  • Auto-scale

Most Important Features

  • Scaling
  • Costs
  • Flexibility

Return on Investment

  • Costs can spiral out of control if not careful
  • Customers put off by costings
  • Competition from GCP

Other Software Used

Microsoft Azure, Google BigQuery, Snowflake

AWS has it all!

Use Cases and Deployment Scope

To keep my review simple it is very convenient that AWS has a MapReduce tool as it was easy to deploy and test with our cloud setup. Also with AWS being well known it is easy to find staff who can use and set up a system and scale our solutions. Definitely an industry leader.

Pros

  • Scalable
  • Flexible
  • Good documentation
  • Cost effective

Cons

  • Integration with ERP for SMEs.
  • To connect to non cloud solutions and replicate data for backup.
  • Better performance metrics for business people such as cost benefits.

Most Important Features

  • Elasticity
  • Reliability
  • Security
  • Flexibility

Return on Investment

  • ROE is slower for small business
  • Less in house resources to manage
  • We can focus more on the business

Alternatives Considered

Apache Hadoop

Other Software Used

SAP Business One, Microsoft SQL Server, Microsoft 365 (formerly Office 365)

Amazon EMR - Truly elastic

Use Cases and Deployment Scope

Used as spark cluster to enable Big data ETL processes. Analysists and data scientists uses clusters for adhoc querying purposes. Raw data ingestion fro. RDBMS systems , APIs, file systems etc. Used elastic feature with different node types to optimize cost. Scope of the use case is a company wide big data platform.

Pros

  • Big data ETL
  • Data ingestion
  • Ad hoc query support

Cons

  • Library management
  • Storing historical steps
  • Downloading EMR job logs could be easier

Most Important Features

  • Steps
  • Bootstrap
  • Elastic

Return on Investment

  • Quick data load
  • Ingestion quality
  • Stability

Other Software Used

AWS Glue, Matillion