Databricks SDK

Databricks SDK#

This page considers details on working with databricks SDK. Check the Databricks SDK for Python for official reference.

from databricks.sdk import WorkspaceClient

Workspace client#

The most popular way to communicate with databricks workspace is through a databricks.sdk.WorkspaceClient. To create it, you must set up Databricks authentification:

Through setting ~/.databrickscfg file. It may look like this:
Through defining environment variables. The most popular are:
- DATABRICKS_HOST: set your databricks host.
- DATABRICKS_TOKEN: set your access token.

The default .databrickscfg file may look like this:

[DEFAULT]
host = https:////dbc-<some unique for workspace>.cloud.databricks.com
token = <here is your token>

The profile name DEFAULT is important. You can specify a different name, but this will be used by default.
The host you can copy from the browser url line (just host, without path).
The token you can get through databricks UI: settings->developer->Access tokens->Manage.

Note. If you have problems with authentication, check the environment variables. Some tools, such as the VSCode Databricks extension, may define some default values starting with DATABRICKS_.... Also, check the ~/.ipython/profile_default/startup if there are some startup scripts that can invisibly change the behavior of the IPython.

If everything cofigured correctly, the following cell shold be runned without any issues:

w = WorkspaceClient()

VSCode#

Some examples that you can only be run using the Databricks extension for VSCode. Its configuration can be extremely tricky.

For now (oct 2025) it requires:

The python3.11: or any other version which supporst distutils.
The readline module can only be added to the Python distribution if it is built with the realine package installed on the system.
It specifies its own spark object, which cannot be created if the regular pyspark distribution is installed on the system. Use a special virtual environment.

Note The extension adds an initialization file to the ipython (/home/user/.ipython/profile_default/startup), so your ipython can change it behavior. To beat this you can follow:

Create IPython profile: ipython profile create databricks.
Copy the startup file created by the extension. It is usually named something like 00-databricks-init-....py. By default, IPython stores its profiles in the ~/.ipython folder.
Create a separate kernel specification that runs python3 -m ipykernel_launcher with the --profile=databricks variable. Kernels are usually stored in the share/jupyter/kernels catalog of the python virtual environment.

If the setup is correct, you will have access to the spark and dbutils objects from the notebook without direct assignment.

spark

<pyspark.sql.connect.session.SparkSession at 0x73f9de975a50>

dbutils

<databricks.sdk.dbutils.RemoteDbUtils at 0x73f9dc5b51d0>

Spark session#

You can get a databricks session that will have access to your databricks workspace by using databricks.connect.DatabricksSession.builder.remote().getOrCreate method.

You cannnot create crate a DatabricksSession if you have a regular pyspark installed on your system. You must run this code from a different Python environment.

The following cell creates a Spark session that attched to the Databricks environment runned in the “serverless” mode.

from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.remote(serverless=True).getOrCreate()

The following cell displays the list of the tables that are available in my Databricks workspace.

spark.sql("SHOW TABLES").show()

+--------+--------------------+-----------+
|database|           tableName|isTemporary|
+--------+--------------------+-----------+
| default|  telco_churn_bronze|      false|
| default|telco_churn_features|      false|
+--------+--------------------+-----------+

Feature engineering#

The databricks.feature_engineering module allows to manipulate feature storage in databricks.

The databricks.feature_engineering.FeatureEngineeringClient object provides methods:

Method	Description
`create_feature_table(...)`	Creates a feature table in Unity Catalog, defining its primary keys, schema, timestamp column, and metadata.
`create_training_set(...)`	Joins features (via `FeatureLookup` or `FeatureSpec`) to a DataFrame to form a training set with metadata.
`log_model(...)`	Logs an MLflow model together with feature metadata so the required features can be fetched automatically at inference.
`score_batch(...)`	Runs batch inference: given a model URI and a DataFrame, automatically fetches missing features, joins them, and returns predictions.
`create_feature_spec(...)`	Defines a feature spec (collection of `FeatureLookup`/`FeatureFunction`) for use in training sets or feature serving.
`create_feature_serving_endpoint(...)`	Creates an endpoint for real-time / online feature serving.
`get_feature_serving_endpoint(...)` / `delete_feature_serving_endpoint(...)`	Manage (retrieve or delete) feature serving endpoints.
`publish_table(...)`	Publishes an offline feature table to an online store for low-latency feature access.
`read_table(...)`	Reads the contents of a feature table into a Spark DataFrame.
`write_table(...)`	Inserts or upserts data into a feature table; supports streaming DataFrames.
`set_feature_table_tag(...)` / `delete_feature_table_tag(...)`	Manage tags (set or delete) on feature tables for governance and organization.
`drop_online_table(...)`	Removes a published feature table from an online store.

For more details and examples check the Feature engineering page.

Serving#

Databricks offers many possibilities of posibilities for serving machine, learning models as well as served models. You can access many functions can be accessed through the Python SDK. The following talbe lists some of them:

Possibility	Description	Python SDK Context / Method Category
Create Endpoint	Programmatically create a new model serving endpoint, including configuration for custom models, Foundation Model APIs, or external models.	`w.serving_endpoints.create()`
Get/List Endpoints	Retrieve the status, configuration, and metadata for a specific serving endpoint or list all serving endpoints in the workspace.	`w.serving_endpoints.get()`, `w.serving_endpoints.list()`
Update Endpoint Configuration	Modify an existing serving endpoint’s configuration, such as changing the served model version, adjusting traffic split, or changing the workload size.	`w.serving_endpoints.update_config()`
Delete Endpoint	Remove a serving endpoint.	`w.serving_endpoints.delete()`
Query Endpoint (Scoring)	Send real-time inference requests to a deployed model serving endpoint, often using an OpenAI-compatible client configured via the SDK for LLMs, or standard methods for custom models.	Handled via the Databricks-configured OpenAI client or other scoring methods (e.g., `mlflow.deployments.get_deploy_client()` in some contexts).
Manage Provisioned Throughput (PT) Endpoints	Create and manage endpoints specifically configured for Foundation Models with guaranteed performance (Provisioned Throughput).	`w.serving_endpoints.create_pt_endpoint()` (and related PT methods)
Retrieve Build Logs	Get the build logs for a served model on a serving endpoint, useful for debugging deployment issues.	`w.serving_endpoints.get_served_model_build_logs()`
Configure AI Gateway	Set configurations related to AI Gateway features like fallbacks, guardrails, inference tables, and usage tracking for the serving endpoint.	Used within the `ai_gateway` parameter of `create()`/`update_config()`.
Configure Rate Limits	Apply rate limits to the serving endpoint (though documentation suggests using AI Gateway for newer rate limit management).	Used within the `rate_limits` parameter of `create()`.
Route Optimization	Enable configuration for route optimization on the serving endpoint for low-latency workloads.	Used within the `route_optimized` parameter of `create()`/`update_config()`.

Check more in the Serving page.

Workspace#

Manage your workspace. You can access this API through WorkSpaceClient().workspace. The following table lists the methods and their descriptions:

Name	Description
`delete(path: str, recursive: Optional[bool])`	Deletes an object or directory (optionally recursively).
`download(path: str, format: ExportFormat)`	Downloads a notebook or file from the workspace.
`export(path: str, format: Optional[ExportFormat])`	Exports an object or directory (e.g. notebook or folder).
`get_permission_levels(workspace_object_type: str, workspace_object_id: str)`	Gets the permission levels a user can have on an object.
`get_permissions(workspace_object_type: str, workspace_object_id: str)`	Gets the current permissions on a workspace object.
`get_status(path: str)`	Gets the status (metadata / existence) of an object or directory.
`import_(path: str, content: Optional[str], format: Optional[ImportFormat], language: Optional[Language], overwrite: Optional[bool])`	Imports an object (notebook / file) or directory into the workspace.
`list(path: str, notebooks_modified_after: int, recursive: bool = False)`	Lists workspace objects under a path (optionally recursive).
`mkdirs(path: str)`	Creates a directory (and any necessary parent dirs).
`set_permissions(workspace_object_type: str, workspace_object_id: str, access_control_list: Optional[List[WorkspaceObjectAccessControlRequest]])`	Sets permissions on an object (replacing existing ones).
`update_permissions(workspace_object_type: str, workspace_object_id: str, access_control_list: Optional[List[WorkspaceObjectAccessControlRequest]])`	Updates permissions on an object (modifies existing ones).
`upload(path: str, content: bytes, format: ImportFormat, language: Language, overwrite: bool = False)`	Uploads a notebook / file or directory content to the workspace.

The following cell demonstrates how to create the file can be created in your Databricks API.

import base64

from databricks.sdk import WorkspaceClient
from databricks.sdk.service import workspace

w = WorkspaceClient()

notebook_path = f"/Workspace/Users/{w.current_user.me().user_name}/knowledge/some_file"

w.workspace.import_(
    content=base64.b64encode(("CREATE LIVE TABLE dlt_sample AS SELECT 1").encode()).decode(),
    format=workspace.ImportFormat.SOURCE,
    language=workspace.Language.SQL,
    overwrite=True,
    path=notebook_path
)

Jobs API#

The Databricks SDK contains a set of wrappers for managing Databricks jobs. Check w.jobs page of the official documentation.

The following cell uses the w.jobs.create to create a job.

w = WorkspaceClient()
w.jobs.create(name="knowledge_job")

CreateResponse(job_id=714431377314811)

You can access jobs using the w.jobs.list method. It returns a generator containing all available jobs.

my_jobs = w.jobs.list()
my_jobs

<generator object JobsExt.list at 0x7ae387747220>

The next cell shows the properties of the created job.

my_job = list(my_jobs)[0]
my_job.job_id, my_job.settings.name

(714431377314811, 'knowledge_job')

The following cell deletes the job.

w.jobs.delete(job_id=my_job.job_id)

The are no more jobs in the databricks now.

list(w.jobs.list())

[]

dbutils#

The dbutils is an object that is pre-initialized in Databricks notebooks. You should only run it in a configured Databricks environment. The following table lists the main modules of dbutils, and their description:

Module	Description
credentials	Utilities for interacting with credentials within notebooks
data	Utilities for understanding and interacting with datasets (EXPERIMENTAL)
fs	Utilities for accessing the Databricks file system (DBFS)
jobs	Utilities for leveraging job features
library	Deprecated. Utilities for managing session-scoped libraries
meta	Utilities to hook into the compiler (EXPERIMENTAL)
notebook	Utilities for managing the control flow of notebooks (EXPERIMENTAL)
preview	Utilities in preview
secrets	Utilities for leveraging secrets within notebooks
widgets	Utilities for parameterizing notebooks
api	Utilities for managing application builds

Check more in Databricks Utilities (dbutils) reference page.

Jobs#

The dbutils.jobs module allows you to manipulate the behavour of Dataricks jobs. To throw messages between jobs use:

The dbutils.jobs.taskValues.set allows to set message that would be picked up by the some other script.
The dbutils.jobs.taskValues.set to pick up the message generated by the other job.

The following cell sets the “hello” message under the “test” key.

dbutils.jobs.taskValues.set(key="test", value="hello")

The following cell picks up the test message leaved by task some_task. For debuging purposes debugValue is set.

dbutils.jobs.taskValues.get(taskKey="some_task", key="test", debugValue="not found")

'not found'

Since the code is not running in the job environment, the debugValue is returned.