Other Miscellaneous Differences
There are several other differences between Databricks and Synapse that have minor impacts on performance and compatibility, such as slight syntax variations, configuration adjustments, and functional differences. Other trivial feature differences between the two software that are irrelevant to this project are outside the scope of this documentation.
Spark SQL Differences
Databricks employs a proprietary version of Spark, which includes additional features and functions compared to the open-source version of Spark used by Synapse. Some of these differences are outlined below.
Support For Nested SQL Queries
Synapse does not support nested SQL queries within Spark functions such as sqlContext
. However this limitation can be easily addressed by refactoring the query using table joins and temporary views.
Support For Passing Variables Across Languages
Databricks supports the usage of widget functions such as getArgument
within SQL blocks to pass variables directly from Python/Scala code to SQL. This functionality is not available within Synapse but can be circumvented by creating a global variable within the Spark configuration, as shown below.
# Sets Spark Configuration
myTable = "<table-name>"
spark.conf.set("myApp.myTable", myTable)
The variables defined in the Spark configuration can be accessed in SQL as follows.
-- Uses Spark Variable --
SELECT * FROM '${myApp.myTable}'
Basic Syntax Differences
Magic Commands
Databricks and Synapse both support magic commands, which enable various functionalities and streamline workflows within notebooks. They allow users to perform tasks that are not directly related to the primary programming language of the notebook.
Databricks commands typically start with a single percentage sign, such as %python
, %scala
and %sql
. In contrast, Synapse commands usually start with a double percentage sign, such as %%python
, %%scala
and %sql
.
Utility Libraries
Both platforms have their respective utility libraries - Databricks Utilities and Microsoft Spark Utilities (for Synapse). These libraries provide a range of tools and functions to simplify tasks within notebooks and jobs, such as file operations and secret management.
Databricks functions begin with the dbutils
command, while Synapse functions start with the mssparkutils
command. Most functionalities are similar across both platforms, though there are differences in syntax for certain functions, such as those related to widgets and key vaults.
Support For JAR Files
Databricks can build JAR files a specified list of dependencies. In contrast, Synapse does not support building from dependencies and does not recognise JAR files that have been pre-built by Databricks. However, this limitation can be worked around by using alternative Python packages.
References
Last updated