Mounting Cloud Object Storage

Mounting facilitates seamless integration of cloud object storage, like blobs and data lakes, into the notebook environment, making them appear as part of the local file system. This functionality streamlines the process of accessing and managing files stored in the cloud, allowing users to handle them directly using direct file paths and removing the need for frequent downloads and uploads.


Mounting In Databricks

Creating A Mount Point

To establish a mount path in Databricks, the user needs to execute a utility function from the dbutils library, which handles authentication and sets up a mount point.

In Databricks, authentication for Azure storage can only be achieved through a Microsoft Entra ID (formerly known as Azure Active Directory) application service principal.

configs = {
    "fs.azure.account.auth.type": "OAuth",
    "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "fs.azure.account.oauth2.client.id": "<application-id>",
    "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
    "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"
}


# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
    source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
    mount_point = "/mnt/<mount-name>",
    extra_configs = configs
)

Accessing The Mounted Storage

To access the folders and files within the mount path, the user can list the contents by utilizing the %fs magic command or the relevant utility function from the dbutils library.

# Appropriate For FDL
%fs ls /mnt/<mount-name>/<folder-name>


# Appropriate For EDL
dbutils.fs.ls("/mnt/<mount-name>/<folder-name>")

Unmounting A Mount Point

To unmount a path, the user needs to execute a corresponding utility function from the dbutils library.

dbutils.fs.unmount("/mnt/<mount-name>")

Mounting In Synapse

Creating A Linked Service

In Synapse, establishing a connection to any data source necessitates creating a linked service to authenticate the workspace with the cloud storage. Linked services fetch authentication values for all required data sources used in the notebook.

Linked services can authenticate cloud storage using any of the following policies.

  • Account Key

  • SAS URI

  • Service Principal

  • System Assigned Managed Identity

  • User Assigned Managed Identity

Creating A Mount Point

To set up a mount path in Synapse, the user needs to utilize a utility function from the mssparkutils library, which manages authentication via an established linked service and creates a mount point.

mssparkutils.fs.mount( 
    "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net", 
    "/mnt/<mount-name>", 
    {"linkedService":"<linked-service-name>"} 
)

Accessing The Mounted Storage

To access the folders and files within the mount path, the user should first retrieve the session mount path and then list the contents using utility functions from the mssparkutils library.

mountPath = f'file:{mssparkutils.fs.getMountPath("/mnt/<mount-name>")}'
mssparkutils.fs.ls(f'{mountPath}/<folder-name>')

Unmounting A Mount Point

To unmount a mount path, the user needs to execute a similar utility function from the mssparkutils library.

mssparkutils.fs.unmount("/mnt/<mount-name>")

Key Differences

Absence Of A Fixed Mount Path In Synapse

In Databricks, the data source is mounted to the backend cluster, whereas in Synapse, it is mounted to the current Spark session using the Livy session ID. This distinction results in a difference in the base mount path for both applications.

# Mount Path In Databricks
file:/dbfs/mnt/<mount-path>/<folder-name>


# Mount Path In Synapse
file:/synfs/<livy-session-id>/mnt/<mount-path>/<folder-name>

As a result of this distinction, there is no universal mount path in Synapse that can be accessed from any notebook. Therefore, each notebook requires the creation of its own local mount path.


References


Last updated