TrustRadius Insights for Databricks Data Intelligence Platform are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Pros
User-Friendly SQL: Users have found the SQL in Databricks to be user-friendly, allowing them to easily write and execute queries. Several reviewers have praised the intuitive nature of the SQL interface, making it accessible for users of different skill levels.
Enhanced Collaboration: The enhanced collaboration between data science and data engineering teams is seen as a positive feature by many users. They appreciate how Databricks facilitates seamless communication and knowledge sharing among team members, ultimately leading to improved productivity and efficiency.
Versatile Integration: The integration with multiple Git providers and the merge assistant is highly valued by users. This feature allows for smooth version control and simplifies the collaborative development process. With this capability, developers can easily manage their codebase, track changes, resolve conflicts, and ensure a streamlined workflow.
Databricks is the primary data platform where we land, standardize, clean, transform, and clean our data sources. We utilize the Workflows feature to automate reoccurring tasks and have built internal applications around the reusable workflows. We use the dashboard feature internally to allow customer success teams and business analysts to keep tabs on the performance and outputs of our products. The workloads are orchestrated in Databricks but executed within our own AWS accounts, allowing us to stay compliant with our stringent security requirements.
Pros
Thoughtful application of AI assistants during the coding and analysis steps.
Intuitive UI for users of varying skill sets.
Frequently updated documentation.
Cons
Greater support for non spark workloads.
Ability to host JAR files on serverless endpoints.
Likelihood to Recommend
Medium to Large data throughput shops will benefit the most from Databricks Spark processing. Smaller use cases may find the barrier to entry a bit too high for casual use cases. Some of the overhead to kicking off a Spark compute job can actually lead to your workloads taking longer, but past a certain point the performance returns cannot be beat.
It is currently used by our Data and Product teams in order to perform deep dives analysis on how our current metrics are performing (KPIs, OKRs), to develop tools for metric predictions based on data models in languages such as SQL and Python while mixing them and giving to the entire company visibility of the results with graphs via shared workspaces
Pros
Cross company shared workspaces for unified comprehension of the data
Combining different languages such as SQL and Python in one single space in order to make data analysis
Quick execution of highly complex queries
Cons
How graphs are created, it requires a certain level of expertise in the platform and it could be more intuitive and user friendly
More guidance on the basics, since some of the new users come from different platforms expecting a similar UI
An option where all the tables are shown with their respective fields, when a DB is selected for a query
Likelihood to Recommend
I reckon is an amazing platform for users with a certain level of expertise for designing experiments and delivering a deep dive analysis that requires execution of highly complex queries, also it is very useful when it comes to cross company shared workspaces for unified comprehension of the data.
it is less appropriate for users who don't have full knowledge of the tables they are going to query on and need more support on the data, since the platform doesn't give an option to see what are the fields in a table before even querying it
VU
Verified User
Manager in Product Management (Financial Services company, 201-500 employees)
Data from APIs is streamed into our One Lake environment. This one lake is S3 on AWS. Once this raw data is on S3, we use Databricks to write Spark SQL queries and pySpark to process this data into relational tables and views.
Then those views are used by our data scientists and modelers to generate business value and use in lot of places like creating new models, creating new audit files, exports etc.
Pros
Process raw data in One Lake (S3) env to relational tables and views
Share notebooks with our business analysts so that they can use the queries and generate value out of the data
Try out PySpark and Spark SQL queries on raw data before using them in our Spark jobs
Modern day ETL operations made easy using Databricks. Provide access mechanism for different set of customers
Cons
Databricks should come with a fine grained access control mechanism. If I have tables or views created then access mechanism should be able to restrict access to certain tables or columns based on the logged in user
There should be improved graphing and dash boarding provided from within Databricks
Better integration with AWS could help me code jobs in Databricks and run them in AWS EMR more easily using better devops pipelines
Likelihood to Recommend
Databricks has helped my teams write PySpark and Spark SQL jobs and test them out before formally integrating them in Spark jobs. Through Databricks we can create parquet and JSON output files. Datamodelers and scientists who are not very good with coding can get good insight into the data using the notebooks that can be developed by the engineers.
VU
Verified User
Team Lead in Engineering (Financial Services company, 10,001+ employees)
We leverage Databricks (DB) to run Big Data workloads. Primarily we build a Jar and attach to DB. We do not leverage the notebooks except for prototyping.
Pros
Extremely Flexible in Data Scenarios
Fantastic Performance
DB is always updating the system so we can have latest features.
Cons
Better Localized Testing
When they were primarily OSS Spark; it was easier to test/manage releases versus the newer DB Runtime. Wish there was more configuration in Runtime less pick a version.
Graphing Support went non-existent; when it was one of their compelling general engine.
Likelihood to Recommend
DB generally fits 95% of what you need to do
Primarily the ability to transform data and or do ad-hoc DS work
VU
Verified User
Director in Information Technology (Financial Services company, 201-500 employees)
[It's] Used by self-service analysts to quickly do analysis
Pros
Very simplified infrastructure initialization
Seamless and automated optimization of job execution
Simple tool to get used to
Cons
Visualization - Great area of improvement
Integration with Git
COST
Likelihood to Recommend
When you have analysts that are not cloud-savvy, this tool helps them quickly run code and not be overwhelmed by infrastructure and optimization. [It's] Less appropriate in production deployments.
VU
Verified User
Director in Engineering (Financial Services company, 10,001+ employees)