Cloud data simplified: a great all-in-one tool for all your data analytics needs
Use Cases and Deployment Scope
Our data warehouse was growing at a 1TB/year rate, and we needed a solution that would be both cheap and effective.
Previously we were using Azure SQL Database with its JSON capabilities and various Azure serverless services to manage our data, but at that growth rate, time and cost were becoming limiting factors.
Pros
- Build, schedule and monitor complex data pipelines (Azure Data Factory component)
- Access your data lake using the familiar T-SQL syntax and TDS-enabled tools (SSMS, ADS, ...). This is especially useful for business people that are used to a specific workflow.
- Support a wide range of data transformation tools, from low-code (DataFlows) to full-code (Spark), all integrated in a single central orchestrator (Azure Data Factory-like)
- Provide all these services as a single very convenient package, without the need to know beforehand all the configuration behind
Cons
- There's no support for Synapse Serverless objects (e.g., views) in SSDT - the VCS-friendly approach to schema deployments from Microsoft. SSDT is available for almost all other SQL Server and Azure SQL products, including Synapse Dedicated SQL Pools.
- There are lots of ways to accomplish the same task, and it's not very clear which one is best suited for a given scenario other than trial and error. Also, some scenarios (e.g., efficient management of late arrivals) don't have a clear solution path.
- I think it would be cool to have a tighter integration of the product with the Azure Data Studio client, not only for connecting to SQL Serverless or Dedicated Pools. For example, PySpark development and debugging would be much easier if done from ADS.
Likelihood to Recommend
It's well suited for large, fastly growing, and frequently changing data warehouses (e.g., in startups). It's also suited for companies that want a single, relatively easy-to-use, centralized cloud service for all their data needs. Larger, more structured organizations could still benefit from this service by using Synapse Dedicated SQL Pools, knowing that costs will be much higher than other solutions.
I think this product is not suited for smaller, simpler workloads (where an Azure SQL Database and a Data Factory could be enough) or very large scenarios, where it may be better to build custom infrastructure.
