Other Miscellaneous Differences

There are several other differences between Databricks and Synapse that have minor impacts on performance and compatibility, such as slight syntax variations, configuration adjustments, and functional differences. Other trivial feature differences between the two software that are irrelevant to this project are outside the scope of this documentation.

Spark SQL Differences

Databricks employs a proprietary version of Spark, which includes additional features and functions compared to the open-source version of Spark used by Synapse. Some of these differences are outlined below.

Support For Nested SQL Queries

Synapse does not support nested SQL queries within Spark functions such as sqlContext. However this limitation can be easily addressed by refactoring the query using table joins and temporary views.

This limitation applies only to Synapse Spark. Nested queries function normally within SQL blocks and blocks beginning with the magic command %%sql

Support For Passing Variables Across Languages

Databricks supports the usage of widget functions such as getArgument within SQL blocks to pass variables directly from Python/Scala code to SQL. This functionality is not available within Synapse but can be circumvented by creating a global variable within the Spark configuration, as shown below.

# Sets Spark Configuration
myTable = "<table-name>"
spark.conf.set("myApp.myTable", myTable)

The variables defined in the Spark configuration can be accessed in SQL as follows.

-- Uses Spark Variable --
SELECT * FROM '${myApp.myTable}'

Basic Syntax Differences

Magic Commands

Databricks and Synapse both support magic commands, which enable various functionalities and streamline workflows within notebooks. They allow users to perform tasks that are not directly related to the primary programming language of the notebook.

Databricks commands typically start with a single percentage sign, such as %python, %scala and %sql. In contrast, Synapse commands usually start with a double percentage sign, such as %%python, %%scala and %sql.

Note that there are a few exceptions to this rule, such as %run, which remains consistent across both languages

Utility Libraries

Both platforms have their respective utility libraries - Databricks Utilities and Microsoft Spark Utilities (for Synapse). These libraries provide a range of tools and functions to simplify tasks within notebooks and jobs, such as file operations and secret management.

Databricks functions begin with the dbutils command, while Synapse functions start with the mssparkutils command. Most functionalities are similar across both platforms, though there are differences in syntax for certain functions, such as those related to widgets and key vaults.

Support For JAR Files

Databricks can build JAR files a specified list of dependencies. In contrast, Synapse does not support building from dependencies and does not recognise JAR files that have been pre-built by Databricks. However, this limitation can be worked around by using alternative Python packages.

This limitation has only been tested with the SQL Spark connector package; JAR files might function correctly in other environments.

References

PreviousPassing Variables Across Notebooks

Last updated 1 year ago