Databricks SDK#
This page considers details on working with databricks SDK. Check the Databricks SDK for Python for official reference.
from databricks.sdk import WorkspaceClient
Workspace client#
The most popular way to communicate with databricks workspace is through a databricks.sdk.WorkspaceClient. To create it, you must set up Databricks authentification:
Through setting
~/.databrickscfgfile. It may look like this:Through defining environment variables. The most popular are:
DATABRICKS_HOST: set your databricks host.DATABRICKS_TOKEN: set your access token.
The default .databrickscfg file may look like this:
[DEFAULT]
host = https:////dbc-<some unique for workspace>.cloud.databricks.com
token = <here is your token>
The profile name
DEFAULTis important. You can specify a different name, but this will be used by default.The
hostyou can copy from the browser url line (just host, without path).The
tokenyou can get through databricks UI: settings->developer->Access tokens->Manage.
Note. If you have problems with authentication, check the environment variables. Some tools, such as the VSCode Databricks extension, may define some default values starting with DATABRICKS_.... Also, check the ~/.ipython/profile_default/startup if there are some startup scripts that can invisibly change the behavior of the IPython.
If everything cofigured correctly, the following cell shold be runned without any issues:
w = WorkspaceClient()
VSCode#
Some examples that you can only be run using the Databricks extension for VSCode. Its configuration can be extremely tricky.
For now (oct 2025) it requires:
The
python3.11: or any other version which supporstdistutils.The
readlinemodule can only be added to the Python distribution if it is built with therealinepackage installed on the system.It specifies its own
sparkobject, which cannot be created if the regularpysparkdistribution is installed on the system. Use a special virtual environment.
Note The extension adds an initialization file to the ipython (/home/user/.ipython/profile_default/startup), so your ipython can change it behavior. To beat this you can follow:
Create IPython profile:
ipython profile create databricks.Copy the startup file created by the extension. It is usually named something like
00-databricks-init-....py. By default, IPython stores its profiles in the~/.ipythonfolder.Create a separate kernel specification that runs
python3 -m ipykernel_launcherwith the--profile=databricksvariable. Kernels are usually stored in theshare/jupyter/kernelscatalog of the python virtual environment.
If the setup is correct, you will have access to the spark and dbutils objects from the notebook without direct assignment.
spark
<pyspark.sql.connect.session.SparkSession at 0x73f9de975a50>
dbutils
<databricks.sdk.dbutils.RemoteDbUtils at 0x73f9dc5b51d0>
Spark session#
You can get a databricks session that will have access to your databricks workspace by using databricks.connect.DatabricksSession.builder.remote().getOrCreate method.
You cannnot create crate a
DatabricksSessionif you have a regularpysparkinstalled on your system. You must run this code from a different Python environment.
The following cell creates a Spark session that attched to the Databricks environment runned in the “serverless” mode.
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.remote(serverless=True).getOrCreate()
The following cell displays the list of the tables that are available in my Databricks workspace.
spark.sql("SHOW TABLES").show()
+--------+--------------------+-----------+
|database| tableName|isTemporary|
+--------+--------------------+-----------+
| default| telco_churn_bronze| false|
| default|telco_churn_features| false|
+--------+--------------------+-----------+
Feature engineering#
The databricks.feature_engineering module allows to manipulate feature storage in databricks.
The databricks.feature_engineering.FeatureEngineeringClient object provides methods:
Method |
Description |
|---|---|
|
Creates a feature table in Unity Catalog, defining its primary keys, schema, timestamp column, and metadata. |
|
Joins features (via |
|
Logs an MLflow model together with feature metadata so the required features can be fetched automatically at inference. |
|
Runs batch inference: given a model URI and a DataFrame, automatically fetches missing features, joins them, and returns predictions. |
|
Defines a feature spec (collection of |
|
Creates an endpoint for real-time / online feature serving. |
|
Manage (retrieve or delete) feature serving endpoints. |
|
Publishes an offline feature table to an online store for low-latency feature access. |
|
Reads the contents of a feature table into a Spark DataFrame. |
|
Inserts or upserts data into a feature table; supports streaming DataFrames. |
|
Manage tags (set or delete) on feature tables for governance and organization. |
|
Removes a published feature table from an online store. |
For more details and examples check the Feature engineering page.
Serving#
Databricks offers many possibilities of posibilities for serving machine, learning models as well as served models. You can access many functions can be accessed through the Python SDK. The following talbe lists some of them:
Possibility |
Description |
Python SDK Context / Method Category |
|---|---|---|
Create Endpoint |
Programmatically create a new model serving endpoint, including configuration for custom models, Foundation Model APIs, or external models. |
|
Get/List Endpoints |
Retrieve the status, configuration, and metadata for a specific serving endpoint or list all serving endpoints in the workspace. |
|
Update Endpoint Configuration |
Modify an existing serving endpoint’s configuration, such as changing the served model version, adjusting traffic split, or changing the workload size. |
|
Delete Endpoint |
Remove a serving endpoint. |
|
Query Endpoint (Scoring) |
Send real-time inference requests to a deployed model serving endpoint, often using an OpenAI-compatible client configured via the SDK for LLMs, or standard methods for custom models. |
Handled via the Databricks-configured OpenAI client or other scoring methods (e.g., |
Manage Provisioned Throughput (PT) Endpoints |
Create and manage endpoints specifically configured for Foundation Models with guaranteed performance (Provisioned Throughput). |
|
Retrieve Build Logs |
Get the build logs for a served model on a serving endpoint, useful for debugging deployment issues. |
|
Configure AI Gateway |
Set configurations related to AI Gateway features like fallbacks, guardrails, inference tables, and usage tracking for the serving endpoint. |
Used within the |
Configure Rate Limits |
Apply rate limits to the serving endpoint (though documentation suggests using AI Gateway for newer rate limit management). |
Used within the |
Route Optimization |
Enable configuration for route optimization on the serving endpoint for low-latency workloads. |
Used within the |
Check more in the Serving page.
Workspace#
Manage your workspace. You can access this API through WorkSpaceClient().workspace. The following table lists the methods and their descriptions:
Name |
Description |
|---|---|
|
Deletes an object or directory (optionally recursively). |
|
Downloads a notebook or file from the workspace. |
|
Exports an object or directory (e.g. notebook or folder). |
|
Gets the permission levels a user can have on an object. |
|
Gets the current permissions on a workspace object. |
|
Gets the status (metadata / existence) of an object or directory. |
|
Imports an object (notebook / file) or directory into the workspace. |
|
Lists workspace objects under a path (optionally recursive). |
|
Creates a directory (and any necessary parent dirs). |
|
Sets permissions on an object (replacing existing ones). |
|
Updates permissions on an object (modifies existing ones). |
|
Uploads a notebook / file or directory content to the workspace. |
The following cell demonstrates how to create the file can be created in your Databricks API.
import base64
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import workspace
w = WorkspaceClient()
notebook_path = f"/Workspace/Users/{w.current_user.me().user_name}/knowledge/some_file"
w.workspace.import_(
content=base64.b64encode(("CREATE LIVE TABLE dlt_sample AS SELECT 1").encode()).decode(),
format=workspace.ImportFormat.SOURCE,
language=workspace.Language.SQL,
overwrite=True,
path=notebook_path
)
Jobs API#
The Databricks SDK contains a set of wrappers for managing Databricks jobs. Check w.jobs page of the official documentation.
The following cell uses the w.jobs.create to create a job.
w = WorkspaceClient()
w.jobs.create(name="knowledge_job")
CreateResponse(job_id=714431377314811)
You can access jobs using the w.jobs.list method. It returns a generator containing all available jobs.
my_jobs = w.jobs.list()
my_jobs
<generator object JobsExt.list at 0x7ae387747220>
The next cell shows the properties of the created job.
my_job = list(my_jobs)[0]
my_job.job_id, my_job.settings.name
(714431377314811, 'knowledge_job')
The following cell deletes the job.
w.jobs.delete(job_id=my_job.job_id)
The are no more jobs in the databricks now.
list(w.jobs.list())
[]
dbutils#
The dbutils is an object that is pre-initialized in Databricks notebooks. You should only run it in a configured Databricks environment. The following table lists the main modules of dbutils, and their description:
Module |
Description |
|---|---|
credentials |
Utilities for interacting with credentials within notebooks |
data |
Utilities for understanding and interacting with datasets (EXPERIMENTAL) |
fs |
Utilities for accessing the Databricks file system (DBFS) |
jobs |
Utilities for leveraging job features |
library |
Deprecated. Utilities for managing session-scoped libraries |
meta |
Utilities to hook into the compiler (EXPERIMENTAL) |
notebook |
Utilities for managing the control flow of notebooks (EXPERIMENTAL) |
preview |
Utilities in preview |
secrets |
Utilities for leveraging secrets within notebooks |
widgets |
Utilities for parameterizing notebooks |
api |
Utilities for managing application builds |
Check more in Databricks Utilities (dbutils) reference page.
Jobs#
The dbutils.jobs module allows you to manipulate the behavour of Dataricks jobs. To throw messages between jobs use:
The
dbutils.jobs.taskValues.setallows to set message that would be picked up by the some other script.The
dbutils.jobs.taskValues.setto pick up the message generated by the other job.
The following cell sets the “hello” message under the “test” key.
dbutils.jobs.taskValues.set(key="test", value="hello")
The following cell picks up the test message leaved by task some_task. For debuging purposes debugValue is set.
dbutils.jobs.taskValues.get(taskKey="some_task", key="test", debugValue="not found")
'not found'
Since the code is not running in the job environment, the debugValue is returned.