TrustRadius: an HG Insights company

Amazon Redshift

Score8.9 out of 10

213 Reviews and Ratings

What is Amazon Redshift?

Amazon Redshift is a hosted data warehouse solution, from Amazon Web Services.

High Performance Data Warehouse

Use Cases and Deployment Scope

We are using Amazon Redshift as a warehousing solution, where we are doing multiple ETL sync from clickstream events as well as transaction DBs.

We are doing analytics on the top this data and utilise this data to build and train data-science models.

We are in gaming industry we are solving business problem such as increasing the number of user gameplay, increasing the revenue, increasing the registration as well as the acquisition.

Pros

  • Fast data retrieval from the table with complex joins via columnar storage and advanced query optimization techniques like parallel execution
  • Great reliable integration with AWS MSK using Amazon Redshift Streaming a low-latency streaming ingestion, AWS Glue and S3
  • Concurrency scaling and work load management - helps in segregating the load distribution based on roles
  • Decoupled storage and compute using RA3 instance type
  • Distribute cluster using Amazon Redshift data sharing i.e centralised write cluster with multiple readonly cluster

Cons

  • Data governance can be better
  • Data catalog and data discovery
  • Data lineage

Return on Investment

  • Positive: Multiple usecase of business analysis and fairplay analysis are serving and given great ROI
  • Positive: Multiple integration with tools such as tableau, redash, redshift spectrum
  • Negative: Data discovery and data lineage

Usability

Alternatives Considered

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Other Software Used

Amazon Managed Streaming for Apache Kafka (Amazon MSK), Apache Spark, Apache Airflow

Very good, but requires engg tuning

Use Cases and Deployment Scope

I use it as the data warehouse of our clients. I use it to build data transformations of user activity logs to ML features. I use the sql workbench to explore datasets and understand data schemas. Post that, I generally connect to the warehouse either through dbt or from jupyter notebooks.

Pros

  • Seamlessly integrates with the data in s3
  • Workbench provides useful way to query the tables within aws console
  • Postgres flavor of sql gives powerful capabilities such as window functions

Cons

  • Json support in sql is very limited.
  • Array type columns are missing. They are by default converted to strings
  • Sql workbench often goes unresponsive. I have to reload for the queries to run
  • A search option in the sql workbench would be great, which let's users search the whole db for a match on columns, tables etc

Most Important Features

  • Central data store
  • Gels well with aws ecosystem
  • Data connectors

Return on Investment

  • Provides as the compute engine for the data transformations, helping the analytics stack
  • Helps ML Products

Alternatives Considered

Snowflake, Google BigQuery, Azure Databricks and Apache Spark

Other Software Used

Amazon SageMaker, Amazon EC2 Auto Scaling, Snowflake

efficient, performant data store

Pros

  • [Amazon] Redshift has Distribution Keys. If you correctly define them on your tables, it improves Query performance. For instance, we can define Mapping/Meta-data tables with Distribution-All Key, so that it gets replicated across all the nodes, for fast joins and fast query results.
  • [Amazon] Redshift has Sort Keys. If you correctly define them on your tables along with above Distribution Keys, it further improves your Query performance. It also has Composite Sort Keys and Interleaved Sort Keys, to support various use cases
  • [Amazon] Redshift is forked out of PostgreSQL DB, and then AWS added "MPP" (Massively Parallel Processing) and "Column Oriented" concepts to it, to make it a powerful data store.
  • [Amazon] Redshift has "Analyze" operation that could be performed on tables, which will update the stats of the table in leader node. This is sort of a ledger about which data is stored in which node and which partition with in a node. Up to date stats improves Query performance.

Cons

  • Amazon Redshift is a Managed Service. But it is Not a 100% managed service. We still need to configure it with WLM (Work Load Management) settings, and add Query Queues to make sure it's resources aren't wasted and it is performant at it's best state, all the time
  • [Amazon] Redshift has a concept of "Vacuum", which is an operation to claim the disk space back from deleted data/tables. They recently started doing automated vacuuming. Prior to that we had to do that at regular intervals, to claim the data back.

Most Important Features

  • MPP (Massively Parallel processing)
  • Column Oriented data store
  • Good Customer Support

Return on Investment

  • Greater ROI, as it is 1/10th the cost of traditional data stores and data warehouses.
  • it is connected to Tableau and Looker dashboards, and various reporting used by Sales, Marketing, Publishers, Operations, BI, Analytics, DataScience, Finance

Alternatives Considered

Google BigQuery and Amazon EMR (Elastic MapReduce)

Other Software Used

Google BigQuery, Snowflake, Databricks Lakehouse Platform (Unified Analytics Platform), Amazon EMR (Elastic MapReduce)

Powerful Data Management Tool

Pros

  • Ease of setting up ETL
  • Uploading data into Redshift via AWS
  • Querying is quick

Cons

  • Missing option to restrict duplicate records
  • Lacks complex data sets like udf
  • Does not offer UI based querying & visualisation option like Looker

Return on Investment

  • Helped in achieving data consolidation that enables business to make data based decisions
  • Saved engineering cost with easy to set ETL processes

Usability

Amazon Redshift Review

Pros

  • Data retrieval experience really gets improved.
  • In terms of database management, it is really a no management at all in AWS. There is no even an OS to take care or worry about.
  • Auto or on-demand scaling is nice.
  • Integrates quite well with other products within the AWS ecosystem.

Cons

  • The number of connections is too small, I think at around 50 are allowed in parallel. With some ETL and apps connecting all the time, this brings an undesired possibility to some users or tools being unable to connect.
  • Needs some tuning.
  • The logging part is almost nonexistent.
  • Can be quite costly in the long run as opposed to just RDS or on-prem/dedicated solutions.

Return on Investment

  • Redshift as a solution is viable in case an enterprise wants a relatively uncomplicated relational data storage engine that is elastic and autonomous.
  • The costs are reasonable compared to other storage systems provided Redshift is capable to grow with your business and reduce administrative costs.
  • Redshift integrates well with many of the offerings within AWS and outside it. The programmability story is also good.

Alternatives Considered

Amazon Aurora, Snowflake and Azure SQL Database

Other Software Used

Amazon Aurora, Azure SQL Database