TrustRadius: an HG Insights company

Azure Databricks

Score8.7 out of 10

33 Reviews and Ratings

What is Azure Databricks?

Azure Databricks is a service available on Microsoft's Azure platform and suite of products. It provides the latest versions of Apache Spark so users can integrate with open source libraries, or spin up clusters and build in a fully managed Apache Spark environment with the global scale and availability of Azure. Clusters are set up, configured, and fine-tuned to ensure reliability and performance without the need for monitoring. The solution includes autoscaling and auto-termination to improve total cost of ownership (TCO).

Categories & Use Cases

Top Performing Features

  • Automatic Data Format Detection

    Automatic detection of data formats and schemas

    Category average: 9.2

  • Data Encryption

    Data encryption to ensure data privacy

    Category average: 8.4

  • Security, Governance, and Cost Controls

    Built-in controls to mitigate compliance and audit risk with user activity tracking

    Category average: 8.6

Areas for Improvement

  • Interactive Data Analysis

    Ability to analyze data interactively using Python or R Notebooks

    Category average: 8.8

  • Connect to Multiple Data Sources

    Ability to connect to a wide variety of data sources including data lakes or data warehouses for data ingestion

    Category average: 8.8

  • Visualization

    The product’s support and tooling for analysis and visualization of data.

    Category average: 8.3

Azure Databricks: A Data Consultant's Dream

Use Cases and Deployment Scope

As a Big Data Consultant. Azure Databricks is my favorite tool in the house!

The biggest problems with data consulting is a plethora of programming languages it deals in, from SQL, Scala,R, Python, Java etc.

That is exactly where Azure Databricks excels! It supports all languages in a single notebook with an equivalent performance for all! Club that with a visually pleasing UI, features that integrate the entire data lifecycle, and an architecture that gets the best of spark and you have one of the best data tools in your hand!

Pros

  • Data Processing and Transformations based on Spark
  • Delta Lakehouse when clubbed with an external cloud storage
  • Governance using Unity Catalog to unify IAM
  • Delta Live Tables is a product, which although relatively newer, has a great potential with the visuals of a pipeline.

Cons

  • The new UI is a bit clunky compared to the old UI. It also adds new elements in the sidebar which are not relevant to the workspace. Can be worked upon
  • Delta Live Tables, although powerful, has a lot of things that can be improved, including error debugging, support for new things
  • Concurrent requests need some more optimisation and work in the delta lake tables.

Return on Investment

  • The support team is amazing, they help you at every stage of the projects, from sales to delivery.
  • On a framework level, it has had an amazing impact and has reduced the clients overall data platform costs by a staggering 65%
  • There has been a 40% Manual work requirement on average for the clients when they move to Azure Databricks Data Platform

Usability

Alternatives Considered

Jupyter Notebook, Azure Synapse Analytics and Cloudera Data Platform

Other Software Used

Azure Data Factory, Cloudera Data Platform, Apache Iceberg

Azure Databricks ! Best of cloud and Big data

Use Cases and Deployment Scope

We are leveraging Databricks capabilities in various use cases. For instance, to design a tailor made change data capture that keep track of users account details and keep it updated in delta lake. We have also designed numerous ETL processes which is scheduled to provide data to data analytics on strict delivery timelines. Moreover, the workspaces is integrated with other Azure services such as Azure synapse analytics, Azure data lake, Azure data factory. Some of our Databricks are triggered by Azure data factory.

Pros

  • Consistently great performance when dealing with huge scale data with the help of spark architecture
  • Magic commands such as spark sql, pyspark, scala . This comes really handy in day to day work
  • Integration with other Azure services is super smooth and robust

Cons

  • Their pipeline workflow orchestration is pretty primitive. Lacks some common features
  • Workspace UI and navigation requires steep learning curve
  • Personally, I am not fond of their autosave feature. Its dangerous for production level notebooks scripts

Other Software Used

Azure Data Factory, Azure Synapse Analytics, Azure Data Lake Storage

Our new go-to tool for managing large databases and tables!

Use Cases and Deployment Scope

We use Databricks to pull performance metrics for our content hosted on the company website. Having one tool to view and analyze the data has been a game changer for us, saving many hours of collecting the data various sources in the past.

Pros

  • SQL
  • Data management
  • Data access

Cons

  • Intuitive interface
  • Ease of use
  • Providing FAQ or QRGs

Return on Investment

  • Helped reduce time for collecting data
  • Reduced cost in maintaining multiple data sources
  • Access for multiple users and management of users/data in a single platform

Other Software Used

Tableau Server, Tableau Cloud, Microsoft Excel