Mounting Cloud Object Storage
Mounting facilitates seamless integration of cloud object storage, like blobs and data lakes, into the notebook environment, making them appear as part of the local file system. This functionality streamlines the process of accessing and managing files stored in the cloud, allowing users to handle them directly using direct file paths and removing the need for frequent downloads and uploads.
Mounting In Databricks
Creating A Mount Point
To establish a mount path in Databricks, the user needs to execute a utility function from the dbutils
library, which handles authentication and sets up a mount point.
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"
}
# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs
)
Accessing The Mounted Storage
To access the folders and files within the mount path, the user can list the contents by utilizing the %fs
magic command or the relevant utility function from the dbutils
library.
# Appropriate For FDL
%fs ls /mnt/<mount-name>/<folder-name>
# Appropriate For EDL
dbutils.fs.ls("/mnt/<mount-name>/<folder-name>")
Unmounting A Mount Point
To unmount a path, the user needs to execute a corresponding utility function from the dbutils
library.
dbutils.fs.unmount("/mnt/<mount-name>")
Mounting In Synapse
Creating A Linked Service
In Synapse, establishing a connection to any data source necessitates creating a linked service to authenticate the workspace with the cloud storage. Linked services fetch authentication values for all required data sources used in the notebook.
Creating A Mount Point
To set up a mount path in Synapse, the user needs to utilize a utility function from the mssparkutils
library, which manages authentication via an established linked service and creates a mount point.
mssparkutils.fs.mount(
"abfss://<container-name>@<storage-account-name>.dfs.core.windows.net",
"/mnt/<mount-name>",
{"linkedService":"<linked-service-name>"}
)
Accessing The Mounted Storage
To access the folders and files within the mount path, the user should first retrieve the session mount path and then list the contents using utility functions from the mssparkutils
library.
mountPath = f'file:{mssparkutils.fs.getMountPath("/mnt/<mount-name>")}'
mssparkutils.fs.ls(f'{mountPath}/<folder-name>')
Unmounting A Mount Point
To unmount a mount path, the user needs to execute a similar utility function from the mssparkutils
library.
mssparkutils.fs.unmount("/mnt/<mount-name>")
Key Differences
Absence Of A Fixed Mount Path In Synapse
In Databricks, the data source is mounted to the backend cluster, whereas in Synapse, it is mounted to the current Spark session using the Livy session ID. This distinction results in a difference in the base mount path for both applications.
# Mount Path In Databricks
file:/dbfs/mnt/<mount-path>/<folder-name>
# Mount Path In Synapse
file:/synfs/<livy-session-id>/mnt/<mount-path>/<folder-name>
As a result of this distinction, there is no universal mount path in Synapse that can be accessed from any notebook. Therefore, each notebook requires the creation of its own local mount path.
References
Last updated